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[57] ABSTRACT 

This invention relates to customized electronic identification 
of desirable objects, such as news articles, in an electronic 
media environment, and in particular to a system that 
automatically constructs both a "target profile" for each 
target object in the electronic media based, for example, on 
the frequency with which each word appears in an article 
relative to its overall frequency of use in all articles, as well 
as a "target profile interest summary" for each user, which 
target profile interest summary describes the user's interest 
level in various types of target objects. The^i^tem then 
evaluat es the target profiles against the users* larget"pro5I e 
i nterest _ sumDaaries_to_geoer ate a user-customized ran k 
o rdered listing of targ et objects most likely to be of interest 
to £ach user so tfiaTthe user can select from amon g these 
potentially rele vant targe t objecte , which were automatically 
sclected"by tHis system from the plethora of target objects 
that are profiled on the electronic media. Users' target profile 
interest summaries can be used to eflSciently organize the 
distribution of information in a large scale system consisting 
of many users interconnected by means of a communication 
network. Additionally, a cryptographically-based pseud- 
onym proxy server is provided to ensure the privacy of a 
user*s target profile interest summary, by giving the user 
control over the ability of third parties to access this sum- 
mary and to identify or contact the user. 

15 Claims^ 13 Drawing Sheets 
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SYSTCM FOR CUSTOMIZED ELECTRONIC em quaUty of an article or other target object to distinguish 

IDENTIFICATION OF DESIRABLE OBJECTS among a number of articles or Urget objects identified as of 

possible interest to a user. 

CROSS-REFERENCE TO RELATED TTiciefore. in the field of information retrieval, there is a 

APPUCAnONS 5 long-standing need for a system which enables users to 

This patent application was originally filed as Provisional "^'(^Sate through the plethora of information. With oommer- 

Patent Application Ser. No. 6(V032.461 on Dec. 9; 1996 and T 1 of commumcaUon networks, such as the Internet, 

is a continuation-in-part of US. patent application Ser. No. ^be growth of av;u]able informaUon has increased. Customi- 

08/346,425, filed Nov. 29, 1994, now US. Pat. No. 5 758 ° f «te user's 

257 and tiUed "SYSTEM AND METHOD FOR SCHED- ^° ""S"* interests is the ultimate solution to this 

ULING BROADCAST OF AND ACCESS TO VIDEO P™"!^™- However, the lechmques which have been pro- 

PROGRAMS AND OTHER DATA USING CUSTOMER P°^i'<> «"^y address the user's interests on a 

PROFILES", which application is assigned to the same f^P^'^f «J I*^"' ""^ intelligence at 

assignee as the present appUcaUon. the cost of unwanted demands on the user's tune and energy. 

15 While many researchers have agreed that traditional meth- 

FIELD OF INVENTION lacking in this regard, no one to date has 

. . . successfully addressed these problems in a holistic manner 

fhis mvention relates to customized electronic identifi- and provided a system that can fuUy leam and reflect the ■ 
cation of desirable objects, such as news articles, in an user's tastes and interests. This is particularly true in a 
electronic media environment, and in particular to a system 20 practical commercial context, such as on-line services avail- 
that automatically constructs both a "target profile" for each able on the Internet. There is a need for an information 
target object m the electronic media based, for example, on retrieval system that is largely or entirely passive 
the ftequency with which each word appears in an article unobtrusive, undemanding of the user, and yet both precise 
relative to its overall frequency of use in aU articles, as weU and comprehensive in its ability to learn and traly represent 
as a "target profile mterest summary" for each user, which «he user's tastes and interests. Present information retrieval 
target profile mterest summary describes the user's interest systems require the user to specHy the desired information 
level m various types of target objects. The system then retrieval behavior through cumbersome interfaces 
evahiates the target profiles against the users' taiget profile Users may receive information on a computer netwo* 
^TJa rT""^ '° a user-customiKd rank either by actively retrieving the information or by passively 
ordered hstmg of taiget objects most likely to be of mterest 30 receiving information that is sent to them. Just as users of 
to each user so that the user can select from among these information retrieval systems face die problem of too much 
potenUaUy relevant targetobjecte, which were automaticaUy information, so do useis who are targeted with electronic 

„^^fi^ "^T, Z j"-* individuals and organizations. An ideal system 

that areprofiledontheelectromcmedia. Users' target profile would protect the user from un5,licited advertising, botfi by 
mterest summaries can be used to efficiently orgamze the 35 automatically extracting only the most relevant messages 
d^lribution of mformation m a large scale system consisting ^ceived by electronic mail, and by preserving the oonfi- 
of many-users mterconnected by means of a communication dentiality of the user's preference^ which stould not be 
network. AdAttonally. a cryptographically based proxy freely available to others on the network. 
scn«r IS provided to enwe the privacy of a user's target Researchers in the field of published article information 
^T-^ ""T' '^^ « '^'ri^^^ have devoted consideLe effort to finding eZent 

interest from a large set of articles. The most widely used 

PROBLEM methods of information retrieval are based on keyword 

matching: the user specifies a set of keywords which the user 

It IS a problem m the field of electronic media to enable 45 thinks arc exclusively found in the desired articles and the 

a user to access information of relevance and interest to the information retrieval computer retrieves all articles which 

user without requiring the user to expend aii excessive contain those keywords. Such methods arc fast, but arc 

amount of time and energy searching for the infomaation. notoriously unrchable. as users may not think of the right 

Electromc media, such as on-line information sources, pro- keywords, or the keywords may be used in unwanted articles 

vide a vast amount of information to users, typicaUy in the so in an irrelevant or unexpected context. As a result, the 

form of "articles," each of which comprises a pubUcation information retrieval computers retrieve many articles 

Item or document that relates to a specific topic. The which are unwanted by the user. The logical combination of 

difficulty with electronic media is that the amount of infor- keywords and the use of wild-card search parameters help 

mation available to the user is overwhehning and the article improve the accuracy of keyword searching but do not 

repository systems that are connected on-line are not orga- ss completely solve the problem of inaccurate search results 

nized in a manner that sufficiently simpHfies access to only Starting in the 1960's, an alternate approach to information 

the articles of mterest to the user PrcscnUy, a user either fails retrieval was developed: users were presented with an article 

to access relevant articles because they are not easily iden- and asked if it contained the information they wanted, or to 

tified or expends a significant amount of time and energy to quantify how close the information contained in the article 

conduct an exhaustive search of all articles to identify those 60 was to what they wanted. Each article was described by a 

most likely to be of interest to the user. Furthermore, even profile which comprised either a list of the words in the 

if tiie user conducts an exhaustive search, present informa- article or, in more advanced systems, a table of word 

tion searching techniques do not necessarily accurately frequencies in the article. Since a measure of similarity 

extract only the most relevant articles, but also present between articles is the distance between their profiles, the 

articles of marginal relevance due to the functional limita- 65 measured similarity of article profiles can be used in article 

tions of the mformation searching techniques. There is also retrieval. For example, a user searching for information on 

no existing system which automatically estimates the inher- a subject can write a short description of the desired infor- 
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mation. The informalion retrieval computer generates ao TF/IDF (where TF is term (word) frequency and IDF is the 

article profile for the request and then retrieves articles with inverse document frequency) and label piles by using the 

profiles similar to the profile generated for the request. These determined key words. 

requests can then be refined using "relevance feedback", vt * * jj - r - i . . 
where the user actively or passively rates the articled . ^ Numerous patents address informaUon retneval methods, 
retrieved as to how close the information contained therein ^""^ "^^""^^^P "^"^'^ of a user s mterest based on 
is to what is desired. The information retrieval computer Pf^""^ momtormg of which articles the user accesses. None 
then uses this relevance feedback information to refine the systems described in these patents pre sent computer 
request profile and the process is repeated untD the user architectures to allow fast retrieval of articles distributed 
either finds enough articles or tires of the search computers. None of the systems described in 
A number of researchers have looked at methods for ^TT address issues of using such article retrieval 
selecting articles of most interest to users. An article tided P^^^^"^ ^ P^'P^ commerce or of 
"Social Information filtering: algorithms for automating "l^^^l^g.^^withcoriimonmtere^^^^ 
* word of mouth'" was published 7i the CHi-95 Proceeding °f T!^ T"^"^ u^.* ^0^^321,833 issued to Chang 
by Patti Maes et al and describes the Ringo information ^-^"^^^ ^ tm^thod in which users choose terms to use 
retrieval system which recommends musiad selections. The ^ an infonmation remeval query, and specify the relative 
Ringo system requires active feedback from the user^^ weightings of the different temis. The Chang system then 
• users must manually specify how much they like or dislike ^^^^^-^^^^^^P^^ 'eveU of w^^^^ 
each musical selection. TTie Ringo system maintains a ^^9^4^^ ^^l^ ^ Landauer et al. teaches a method for 
complete listofusersratingsofmusicselectionsand makes ^^^"^g^^^^f T^^P^^^ oflanguages by con- 
recommendations by finding which selections were liked by ""^^^^S latent vectors" (SVD or PCA vectors) which 
multiple people. However, the Ringo system docs not take ^r^'^^V^.T T the different words. U^S^Pat. 
advantage of any avaQable descriplSr^s of the music, such as 5331,554 issued to Graham et al. disclc^ a method for 
structured descriptions in a data base, or free text, such as "^^^^^ "^^""^ f " f 

that contained in music reviews. An article titled "Evolving „ f ' f f'^*^* 5-331,556 addresses 

agents for personalized information filtering", published al ^^^^^^^/fj ^^"^"^8 morphological part-of-speech infor- 

the Proc. 9th IEEE Conf. on Al for Applications by Sheth ft make use of the similanties of different 

and Maes, described the use of agents for information ^"""^ ^^"^^ "articles^), 

fihering which use genetic algorithms to learn to categorize Therefore, there presently is no information retrieval and 
Usenet news articles. In this system, users must define news 30 ^^^^'^^^ system operable in an electronic media environ- 

catcgories and the users actively indicate their opinion of the enables a user to access information of relevance 

selected articles. Their system uses a list of keywords to interest to the user without requiring the user to expend 

represent sets of articles and the records of users* interests *° excessive amount of time and energy, 
are updated using genetic algorithms. 

A number of other research groups have looked at the 35 SOLUTION 

automatic generation and labeling of clusters of articles for The above-described problems are solved and a technical 

the purpose of browsing through the articles. A group at advance achieved in the field by the system for customized 

Xerox Pare published a paper titled "Scatter/gather: a electronic identification of desirable objects in an electronic 

cluster-based approadi to browsing large article collections'* media environment, which system enables a user to access 

at the 15 Ajin. Int'l SIGIR *92, ACM 318-329 (Cutting et al. 40 target objects of relevance and interest to the user without 

1992). This group developed a method they call "scatter/ requiring the user to expend an excessive amount of time 

gather** for performing information retrieval searches. In this and energy. Profiles of the target objects are stored on 

method, a collection of articles is "scattered" into a small electronic media and arc accessible via a data communica- 

numbcr of clusters, the user then chooses one or more of tion networic. In many applications, the target objects are 

these clusters based on short summaries of the cluster. The 45 informational in nature, and so may themselves be stored on 

selected clusters are then "gathered** into a subcollection. electronic media and be accessible via a data communication 

and then the process is repeated. Each iteration of this network. 

process is expected to produce a small, more focused Relevant definitions of terms for the purpose of this 
coUection. The cluster "summanes" are generated by pick- description include: (a.) an object available for access by the 
mg those words whidi appear most frequenUy in the cluster 50 user, which may be either physical or electronic in nature, is 
and the utles of those articles closest to the center of the termed a "target object", (b.) a digitaUy represented profile 
cluster. However, no feedback from users is coUected or indicating that target objea's attributes is termed a "target 
stored, so no performance improvement occurs over time. profile^ (c.) the user looking for the target object is termed 
Apple's Advanced Technology Group has developed an a "user**, (d.) a profile holding.that user's attributes, includ- 
interface based on the concept of a "pile of articles**. This 55 ing age/zip code/etc. is termed a "user profile*', (e.) a 
interface is described in an article tided "A 'pile* metaphor summary of digital profiles of target objects that a user likes 
for supporting casual organization of infonmation in Human and/or dislikes, is termed the "target profile interest sum- 
factors in computer systems** published in CHI *92 Conf. mary** of that user, (f.) a profile consisting of a collection of 
Proc. 627-634 by Mander, R. G. Salomon and Y Wong. attributes, such that a user likes target objects whose profiles 
1992. Another article titled "Content awareness in a file 60 are similar to this collection of attributes, is termed a "search 
system interface: implementing the ' pile' metaphor for orga- profile" or in some contexts a "query" or "query profile," (g.) 
nizing information" was published in 16 Ann. Int*l SlGlR a specific embodiment of the target profile interest summary 
*93, ACM 260-269 by Rose E.D.etal. The Apple interface which comprises a set of search profiles is termed the 
uses word frequencies to automalicaUy file articles by pick- "search profile set" of a user, (h.) a collection of target 
ing the pile most simDar to the article being filed. This 65 objects with similar profiles, is termed a "cluster," (i.) an 
system functions to cluster articles into subpiles, determine aggregate profile formed by averaging the atu-ibutes of all tar 
key words for indexing by picking the words with the largest get objects in a cluster, termed a "cluster profile," (j.) a real 
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number determined by calculating the statistical variance of 
the profiles of all target objects in a cluster, is termed a 
"cluster variance," (k.) a real number determined by calcu- 
lating the maximum distance between the profiles of any two 
target objects in a cluster, is termed a "cluster diameter," 5 

The system for electronic identification of desirable 
objects of the present invention automatically constructs 
both a target profile for each target object in the electronic 
media based, for example, on the frequency with which each 
word appears in an article relative to its overall frequency of 10 
use in all articles, as well as a "target profile interest 
summar/* for each user, which target profile interest sum- 
mary describes the user's interest level in various types of 
target objects. The system then evaluates the target profiles 
against the users' target profile interest summaries to gen- 15 
erate a user-customized rank ordered listing of target objects 
most likely to be of interest to eadi user so that the user can 
select firom among these potentially relevant target' objects, " 
which were automatically selected by this system from the 
plethora of target objects available on the electronic media. 20 

Because people have multiple interests, a target profile 
interest summary for a single user must represent multiple 
areas of interest, for example, by consisting of a set of 
individual search profiles, each of which identifies one of the 
user's areas of interest Each user is presented with those 25 
target objects whose profiles most closely match the user's 
interests as described by the user's target profile interest 
summary. Users' target profile interest summaries are auto- 
matically updated on a continuing basis to reflect each user's 
changing interests. In addition, target objects can be grouped 30 
into clusters based on their similarity to each other, for 
example, based on similarity of their topics in the case where 
the target objects axe published articles, and menus auto- 
matically generated for each cluster of target objects to allow 
users to navigate throughout the clusters and manually 35 
locate target objects of interest. For reasons of confidenti- 
ality and privacy, a particular user may not wish to make 
public all of the interests recorded in the user's target profile 
interest summary, particularly when these interests are deter- 
mined by the user's purchasing patterns. The user may 40 
desire that all or part of the target profile interest summary 
be kept confidential, such as information relating to the 
user's political, religious, financial or purchasing behavior; 
indeed, confidentiality with respect to purchasing behavior 
is the user's legal right in many states. It is therefore 45 
necessary that data in a user's target profile interest summary 
be protected from unwanted disclosure except with the 
user's agreement. At the same time, the user's target profile 
interest siumnaries must be accessible to the relevant servers 
that perform the matching of target obj ects to the users, if the 50 
benefit of this matcbii^ is desired by both providers and 
consiuners of the taigjet objects. The disclosed system pro- 
vides a solution to the privacy problem by using a proxy 
server which acts as an intermediary between the informa- 
tion provider and the user. The proxy server dissociates the 55 
user's tme identity from the pseudonym by the use of 
cryptographic techniques. The proxy server also permits 
users to control access to their target profile interest sum- 
maries and/or user profiles, including provision of this 
information to marketers and advertisers if they so desire, 60 
possibly in exchange for cash or other considerations. Mar- 
keters may purchase these profiles in order to target adver- 
tisements to particular users, or they may purchase partial 
user profiles, which do not include enough information to 
identify the individual users in question, in order to carry out 65 
standard kinds of demographic analysis and market research 
on the resulting database of partial user profiles. Pscudony- 
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mous control of an information server suggests how a 
special discount can be issued to a user's pseudonym and 
that such a digital credential is provided to the user as a 
result of his/her user profile making him/her eligible. The 
user may thus present this type of credential to the appro- 
priate vendor to take advantage of the discount This tech- 
nique can be extended also to smart cards wherein the digital 
credential providing the discount is downloaded frt>m the 
client to the smart card and upon presentation, the vendor 
may if desired, delete the credential upon redemption by the 
user. These discount credentials may similarly include any 
of the discount types (customized promotions) herein dis- 
closed wherein each purchase may identified (characterized) 
and credentialized by the vendor onto the user's smart card 
and/or the vendor's system. 

In the preferred embodiment of the invention, the system 
for customized electronic identification of desirable objects 
uises a fundamental methodology for accurately" and "eflfi-' 
ciently matching users and target objects by automatically 
calculating, using and updating profile information that 
describes both the users* interests and the target objects' 
characteristics. The target objects may be published articles, 
purchasable items, or even other people, and their properties 
arc stored, and/or represented and/or denoted on the elec- 
tronic media as (digital) data. Examples of target objects can 
include, but are not limited to: a newspaper story of potential 
interest, a movie to watch, an item to buy, e-mail to receive, 
or another person to correspond with. In one suggested 
application, the user is a sender of email (which may have 
originated from the user for or from another external source 
such as from outside of a largp organization) and the target 
objects are users who might be considered most appropriate 
based upon previous messages which they have received, 
read and responded to. Accordingly, like other target objects, 
users (or user pseudonyms) in accordance with their user 
profiles (or portions of which they have disclosed) may be 
organized and browsed within an automatically generated 
menu tree, which is below described in detail. In all these 
cases, the information delivery process in the preferred 
embodiment is based on determining the similarity between 
a profile for the target object and the profiles of target objects 
for which the user (or a similar user) has provided positive 
feedback in the past. The individual data that describe a 
target object and constitute the target object's profile are 
herein termed "attributes" of the target object Attributes 
may include, but are not limited to, the following: (1) long 
pieces of text (a newspaper story, a movie review, a product 
description or an advertisement), (2) short pieces of text 
(name of a movie's director, name of town from which an 
advertisement was placed, name of the language in which an 
article was written), (3) numeric measurements (price of a 
product, rating given to a movie, reading level of a book), (4) 
associations with other types of objects (list of actors in a 
movie, list of persons who have, read a document). Any of 
these attributes, but especially the numeric ones, may cor- 
relate with the quality of the target object, such as measures 
of its popularity (how often it is accessed) or of user 
satisfaction (number of complaints received). 

The preferred embodiment of the system for customized 
electronic identification of desirable objects operates in an 
electronic media environment for accessing these target 
objects, which may be news, electronic mail, other pub- 
lished documents, or product descriptions. The system in its 
broadest construction comprises three conceptual modules, 
which may be separate entities distributed across many 
implementing systems, or combined into a lesser subset of 
physical entities. The specific embodiment of this system 



03/09/2004, EAST Version: 1.4.1 



6,029,195 

disclosed herein iUustrates the use of a first module which and by generating a measurement of the depth to which the 

automatically constructs a "target profile" for each Urget user reads each article. This informaUoo is then used to 

object m the electronic media based on various dcscripUvc update the user's target profile interest summary. Browsine 

attributes of the target object. Asecond module interest provides an alternate method of selecting a smaU subset of 
feedbade from users to construct a "tajget profile interest 5 a large number of target objects, such as articles. Articles are 

rf!l'e'2^-^n:tr% I'^^f orgaLedsothatt^rscan'l^velynavigateamonggroups 

proMe sef consistmg of a plurality of seardi profile^ each of ^nicies by moving from one group to a largerfmoVe 

lT^e?«frfi'^h Seoeral group, to a sm.Uer. more'spedfic group, ^r to a 

user. The system further mcludes a profile processmg mod- closely related group. Each individud article forms a one- 
ule which esumates each user s mterest m various Urget ,0 member group of its own. so that the user can navigate to 

oDjects _Dy reterence to the users target profile interest and from individual article s as well as lareer erouos. The 

K^T^ ■nel'ods used by the system for customized el^onic 

these target objects agamst the search profiles m users' identification of desirable objects allow articles to be 

search profile sete, and generates for each user a customized grouped into dusteis and the clusters to be grouped and 
Jn1^^r^?''i^ °f « merged into larger and buger clusters. TTiese hLaihies of 

interest to that user. Each user's Urget profile mterest clusten* then fom the basl for menuing and navigational 

summary is automaUcally updated on a oontmumg basis to systems to allow the rapid searching of large numbers of 

•reflect the user s changmg interests.- ■ • - ■ • .^-^.i™ -n,- 1 . • ■ . u • . \ f".. TT..: . 

„ . . articles. This same clustermg techmque is apphcable to any 
Target objects may be of various sorts, and it is sometimes type of Urget objects that can be profiled on the electronic 
advantageous to use a smgle system that defiveis and/or 20 media such as product selections within a menu or through- 
clusters Uiget objects of several distinct sorts at once, in a out the World Wide Web 

unified framework. For example, users who exhibit a stroiig jhere are a number of variations on the theme of devel- 

1,^^ n^sr^t '"•^M^'''r'r°'l'^"''*^^ "PJ-^ using profits for article retrieval. Variations of 

Tn^^^^T^.'^ ""I^" ^^""^ ''y^*'" ^ disclosed and comprise a system to 
ZZlfT^ K ""^"^ '^f .*J«"«J« « filter electronic mail, an extension for retrieval oi Urget 

moviescand^versuchaooriehtionaiidexploititiDord^ objects such as purchasable items which may have m!re 

to ^oup particular novels with particular movjes. e.g., for complex descriptions, a system to automaticaUy build and 

cluslenng purposes, or to recommend the movies to a user / * u • ^ixian^^iij uuiiu <mu 

who has deZnstrated interest in the novels. StoilariS wZZf^^f fi>r browsmg and searchmg through 

users who exhibit an interest in ceruin World Wide Web 30 , v T "'^V^.^^ " ^'i'^^ '° 

sites also exhibit an interest in certain products, the sya«n ^•^fj°°^"»'>'« °f P»Ple«^th^ 

™ ♦ K ik A ^ -^u *u • v^^^^ ajfaicuj intelligent filters and browsers are necessary to provide a 

can match the products with the sites and thereby rccom- ■ ♦ ir * * • . j 1 ^i'-'^^"^ 

J . *u 1 * r *L J i^^it-ujr ii^ui t^jy passive, mteUigent system mterfacc. A user mterface 

mend to the marketers of those products dial they p ace tK i - j • lun^iiayt 

^ j„^^- _ ^ - ^ that permits intuitive browsmg and filtermg represente for 

K^T?Z^?r ol^ ^ ^^ "t}^"^ the fina time an intelligent system for deteLning the 

hnb to their own site^ TTie presently desmbed system 3s aflSnities between users and tiuget objects, m deUiled. 

:^rtL?^rSmSmateSnronT;b' comprehensive Urget profil^aruser4ecific Urget p^ 

SiT^sirl^u's'^^'^'^^'^^r'^'^r'^" roSo^q^riitrrnf^r^s-^"^: 

tnat particular product or service, and roulme advertise- ;«f^™«*,. -t ^ j .1. • c \ 

. u • M ^'^^^""s mformation maps so produced and the apphcauon of users 

,T^r i . h "^"^ '""f assumes that be cause « target profile interest summaries to predto the inforaiation 

^i^^a^ '^-.h ^r^t ^"^t °^ ^ ^ consumption patterns of a user aUows for preK^aching of 

forTrtI^^rLrr.tr m^opnatc dau at locations on the daU communication Network and at 

r.t^^ 1° ■ times that minimize the traffic flow in the communication 

I«Sf„vItln 1a acoustic voice chat) usmg a text to network to thereby efficiently provide the desired informa- 

r^r?„,?^S^ 7^ 1? «>°J«nction with 45 tion to the user and/or conserve valuable storage space by 

^tl^hM chirSfi^ AH ^tT^ "'^y "hose urget objects (or segments the^o? which 

withm that chat session. Advertisements which ate relevant are relevant to the u^r-s interests, 
nature ot the content being discussed at present may provide 

temporary links to the appropriate produa such that when BRIEF DESCRIPTION OF THE DRAWING 

the nature of the content dianges the advertisements changes so ci,- i -ii. j. 

(may disappear) accordingly. * "h^trates m block diagram fi)im a typical archi- 

Th<. oK;i.-h, ..K • -1 <• CI J -L. lecture of an electronic media system in which the system 

t.^, nt^^clT he sumlanty of p«)files descnbmg fo, customized electronic identification of desirable objects 

h.?. w^it^i. ^ ^PP^^ r r ofthepresentinventioncanbeimplementedaspartof auser 

basic ways: filtermg and browsmg. Filtermg is useful when server system- pan ui <i u»:i 

large numbers of target objects are described in the elec- ss ™^ i .1. ' • , ^• 

tronic medias pace. These target objects can for example be '^^^^^^^^ ^ ^lock diagram form one embodiment 

articles that are received or potentially received by a user. f the system for customized electronic identification of 

who only has time to read a small fraction of them. For ^^esirable objects; 

example, one might potentially receive all items on the AP ^ ^ illtistrate typical network trees; 

news wire service, all items posted to a number of news 60 ^ illustrates in flow diagram form a method for 

groups, all advertisements in a set of new^apers, or all automatically generating article profiles and an associated 

unsolicited electronic mail, but few people have the time or hierarchical menu system; 

incfination to read so many articles. A filtering system in the FIGS. 6-9 illustrate examples of menu generating pro- 
system for customized elecu-onic identification of desirable cess; 

objecu automatically selects a set of articles that the user is 65 FIG. 10 illustrates in flow diagram form the operational 

likely to wish to read. The accuracy of this filtering system steps taken by the system for customized electronic identi- 

improves over time by noting which articles the user reads fication of desirable objects to screen articles for a user; 



03/09/2004, EAST Version: 1.4.1 



6,029,195 



FIG. 11 illustrates a hierarchical cluster tree example; 

FIG. 12 illustrates in flow diagram form the process for 
determination of likelihood of interest by a ^ecific user in 
a selected target object; 

FIGS. 13A-B illxistrate in flow diagram form the auto- ^ 
matic clustering process; 

FIG. 14 illustrates in flow diagram form the use of the 
pseudonymous server; 

FIG. 15 illustrates in flow diagram form the use of the 
system for accessing information in response to a user 
query; and 

FIG. 16 illustrates in flow diagram form the use of the 
system for accessing information in response to a user query 
when the system is a distributed network implementation. ^ 

DETAILED DESCRIPTION 
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* ^ " MEASURING' SIMILARITY 

This section describes a general procedure for automati- 
cally measuring the similarity between two target objects, or, 20 
more precisely, between target profiles that arc automatically 
generated for each of the two target objects. This similarity 
determination process is applicable to target objects in a 
wide variety of contexts. Target objects being compared can 
be, as an example but not limited to: textual documents, ^5 
himian beings, movies, or mutual fiinds. It is assumed that 
the target profiles which describe the target objects are 
stored at one or more locations in a data communication 
network on data storage media associated with a computer 
system. 30 

The computed similarity measurements serve as input to 
additional processes, which fiinction to enable human users 
to locate desired target objects using a large computer 
system. These additional processes estimate a human user's 
interest in various target objects, or else cluster a plurality of 
target objects in to logically coherent groupSw The methods 
used by these additional processes might in principle be 
implemented on either a single computer or on a computer 
network. Jointly or separately, they form the underpinning 
for various sorts of database systems and information ^ 
retrieval systems. 
Target Objects and Attributes 

In classical Information Retrieval (IR) technology, the 
user is a literate human and the target objects in question are 
textual documents stored on data storage devices intercon- 
nected to the user via a computer network. That is, the target 
objects consist entirely of text, and so are digitally stored on 
the data storage devices within the computer network. 
However, there are other target object domains that present 
related retrieval problems that are not capable of being 
solved by present information retrieval technology which 
are applicable to targeting of articles and advertisements to 
readers of an on-line newspaper 

(a.) the user is a film buff and the target objects are movies 
available on videotape. 

(b.) the user is a consumer and the target objects are used 
cars being sold. 

(c.) the user is a consumer and the target objects are 
products being sold through promotional deals. 

(d.) the user is an investor and the target objects are 
publicly traded stocks, mutual funds and/or real estate 
properties. 

(e.) the user is a student and the target objects arc classes 
being offered. g5 

(f.) the user is an activist and the target objects are 
Congressional bills of potential concern. 



(g.) the user is about to send an e-mail message and the 
target objects are potential recipients who are interested 
in the content of that message, 
(h.) the user is a corporate receptionist receiving incoming 
e-mail, voice mail or live telephone calls and the target 
objects are the employees which are the most qualified 
to handle those incoming media, 
(i.) the user is a net-surfer and the target objects are links 
to pages, servers, or newsgroups available on the World 
Wide Web which are linked firom pages and articles in 
the on-line newspaper, 
(j.) the user is a philanthropist and the target objects are 
charities. 

(k.) the user is ill and the target objects are ads for medical 
specialists. 

^ (I.) the user is an employee and the target objects are 
classifieds for poteiitiar einployers. ' " 
(m.) the user is an employer and the target objects are 

classifieds for potential employees, 
(n.) the user is a lonely heart and the target objects are 

classifieds for potential conversation partners, 
(o.) the user is in search of an expert and the target objects 
are users, with known retrieval habits, of an document 
retrieval system, 
(p.) the user is in need of insurance and the target objects 

are classifieds for insurance policy offers. 
Id all these cases, the user wishes to locate some small 
subset of the target objectsr— such as the target objects that 
the user most desires to rent, buy, investigate, meet, read, 
give mammograms to, insure, and so forth. ITie task is to 
help the user identify the most interesting target objects, 
where the user's interest in a target object is defined to be a 
numerical measurement of the user's relative desire to locate 
that object rather than others. 

The generality of this problem motivates a general 
approach to solving the information retrieval problems noted 
above. It is assumed that many target objects are known to 
the system for customized electronic identification of desir- 
able objects, and that specifically, the system stores (or has 
the ability to reconstruct) several pieces of information 
about each target object. These pieces of information are 
termed "attributes": 

collectively, they are said to form a profile of the target 
object, or a "target profile." For example, where the system 
for customized electronic identification of desirable objects 
is activated to identify selections of interest in a particular 
category of on-line products for review or purchase by the 
user, it can be appreciated that there are certain unique sets 
of attributes which are pertinent to the particular product 
category of choice. For the application as part of a movie 
critic column (where the system identifies novel titles and 
reviews which are most interesting to the user) the system is 
likely be concerned with the values of attributes such as 
these: 

(a.) title of movie, 
(b.) name of director, 

(c.) Motion Picture Association of America (MPAA) 

child-appropriateness rating (0=G, 1=»PG, . . . ). 
(d.) date of release, 

(e.) number of stars granted by a particular critic, 
(f.) number of stars granted by a second critic, 
(g.) number of stars granted by a third critic. 
For example, a customized financial news column may be 
presented to the user in the form of articles which are of 
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interest to the user. In this case, however, an accordingly 
those stocks which are most interesting to the user may be 
presented as well 
(h.). full text of review by the third critic, 
(i.). list of customers who have previously rented this 

movie, 
(j.) list of actors. 

Each movie has a different set of values for these 
attributes. This example conveniently illustrates three kinds 
of attributes. Attributes c-g are numeric attributes, of the 
sort that might be found in a database record. It is evident 
that they can be used to help the user identify target objects 
(movies) of interest. For example, the user might previously 
have rented many Parental Guidance (PG) films, and many 
films made in the 1970*s. This generalization is useful: new is 
fihns with values for one or both attributes that are numeri- 
cally similar to these (such as MPAA rating of 1. release date 
■'of* 1975) are judged similar to the fihns the user akeady ' 
likes, and therefore of probable interest. Attributes a-b and 
h are textual attributes. They too are important for helping 20 
the user locate desired films. For example, perhaps the user 
has shown a past interest in films whose review text 
(attribute h) contains words like "chase,** "explosion," 
"explosions " "hero," "gripping," and "superb." This gen- 
eralization is again useful in identifying new films of inter- 
est. Attribute i is an associative attribute. It records associa- 
tions between the target objects in this domain, namely 
movies* and ancillary target objects of an entirely different 
sort, namely humans. A good indication that the user wants 
to rent a particular movie is that the user has previously 
rented other movies with similar attribute values, and this 
holds for attribute I just as it does for attributes a-h. For 
example, if the user has often liked movies that customer 
and customer have rented, then the user may like 
other such movies, which have similar values for attribute i. 
Attribute j is another example of an associative attribute, 
recording associations between target objects and actors. 
Notice that any of these attributes can be made subject to 
authentication when the profile is constmcted, through the 
use of digital signatures; for example, the target object could 
be accompanied by a digitally signed note from the MPAA, 
which note names the target ctoject and specifies its authentic 
value for attribute c. 

These three kinds of attributes are common: numeric, 
textual, and associative. In the classical information retrieval 
problem, where the target objects arc documents (or more 
generally, coherent document sections extracted by a text 
segmentation method), the system might only consider a 
single, textual attribute when measuring similarity: the full 
text of the target object However, a more sophisticated 
system would consider a longer target profile, including 
numeric and associative attributes: 

(a.) full text of document (textual), 

(b.) title (textual^ 

(c.) author (textual), 

(d.) language in which document is written (textual), 
(e.) date of creation (numeric), 
(f.) date of last update (numeric), 
(g.) length in words (numeric), 
(h.) reading level (numeric), 

(i.) quality of document as rated by a third party editorial 

agency (nimieric), 
(j.) list of other readers who have retrieved this document 

(assodaiive). 

As another domain example, consider a domain where the 
user is an advertiser and the target objects are potential 
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customers. The system might store the following attributes 
for each target object (potential customer): 
(a.) first two digits of zip code (textual), 
(b.) first three digits of zip code (textual), 
(c.) entire five-digit zip code (textual), 
(d.) distance of residence from advertiser's nearest physi- 
cal storefiront (numeric), 
(e.) annual family income (numeric), 
(f.) number of children (numeric), 
(g.) list of previous items purchased by this potential 

customer (associative), 
(h.) list of filenames stored on this potential customer's 

client computer (associative), 
(i.) list of movies rented by this potential customer 
(associative), 

(j.) list of investmerits m this poiential ciistomer's invest- 
ment portfolio (associative), 
(k.) list of documents retrieved by this potential customer 
(assodaiive), 

(1.) written response to Rorschach inkblot test (textual), 
(m.) multiple-choice responses by this customer to 20 

self-image questions (20 textual attributes). 
As always, the notion is that similar consumers buy 
similar products. It should be noted that diverse sorts of 
information are being used here to diaracterize consumers, 
from their consumption patterns to their literary taste s and 
psychological peculiarities, and thai this fact illustrates both 
the flexibility and power of the system for customized 
electronic identification of desirable objects of the present 
invention. Diverse sorts of information can be used as 
attributes in other domains as well (as when physical, 
economic, psychological and interest-related questions arc 
used to profile the applicants to a dating service, which is 
indeed a possible domain for the present system), and the 
advertiser domain is simply an example. 

As a final domain example, consider a domain where the 
user is an stock market investor and the target objects are 
publicly traded corporations. A great many attributes might 
be used to characterize each corporation, including but not 
limited to the following: 

(a.) type of business (textual), 
(b.) corporate mission statement (textual), 
(c.) number of employees during each of the last 10 years 

(ten separate numeric attributes), 
(d.) percentage growth in number of employees during 

each of the last 10 years, 
(e.) dividend payment issued in each of the last 40 

quarters^ as a percentage of current share price, 
(f.) percenUge appreciation of stock value during each of 
the last 40 quarters, hst of shareholders (associative), 
(g,) composite text of recent articles about the corporation 
in the financial press (textual). 
For example, a customized financial news column may be 
presented to the user in the form of articles which are of 
interest to the user. In addition, those stocks which are most 
interesting to the user may be presented as well. 

It is worth noting some additional attributes that are of 
interest in some domains. In the case of documents and 
certain other domains, it is useful to know the source of each 
target object (for example, refereed journal article vs. UPI 
newswire article vs. Usenet newsgroup posting vs. question- 
answer pair from a question-and-answer list vs. tabloid 
newspaper article vs. . . . ); the source may be represented 
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as a single-term textual attribute. Important associative However, for lengthy textual attributes, such as the text of 
attributes for a hypertext document are the list of documents an entire document, the score of a word is typically defined 
that it links to, and the list of documents that link to it. to be not merely its term frequency, but its term fiequency 
Documents with similar citations are similar with respect to multiplied by the negated logarithm of the word's "global 
the former attribute, and documents that are cited in the 5 frequency,*' as measured with respect to the textual attribute 
same places are similar with respect to the latter. A convcn- in question. The global frequency of a word, which effec- 
tion may optionally be adopted that any document also links tivcly measxires the word's uninformativeness, is a fraction 
to itself. Especially in systems where users can choose between 0 and 1, defined to be the fraction of all target 
whether or not to retrieve a target object, a target object's objects for which the textual attribute in question contains 
popularity (or circulation) can be usefully measured as a lo this word. This adjusted score is often known in the art as 
numeric attribute specifying the number of users who have TF/IDF ("term frequency times inverse document 
retrieved that object. Related measurable numeric attributes frequency"). When global frequency of a word is taken into 
that also indicate a kind of popularity include the number of account in this way, the conunon, uninformative words have 
replies to a target object, in the domain where target objects scores comparatively close to zero, no matter how often or 
are messages posted to an electronic community such as an 15 rarely they appear in the text. Thus, their rate has little 
computer bulletin board or newsgroup, and the number of influence on the object's target profile. Alternative methods 
links leading to a target object, in the domain where target of calculating word scores include latent semantic indexing 
objects' are interlinked hypertext documents on the World "or probabilistic models. ' ■ ^ . 
Wid& Web or a similar system. A target object may also Instead of breaking the text into its component words, one 
receive explicit numeric evaluations (another kind of 20 couU alternatively break the text into overlapping word 
numeric attribute) from various groups, such as the Motion bigrams (sequences of 2 adjacent words), or more generally. 
Picture Association of America (MPAA), as above, which word n-grams. These word n-grams may be scored in the 
rates movies' appropriateness for children, or the American same way as individual words. Another possibility is to use 
Medical Association, which might rate the accuracy and character n-grams. For example, this sentence contains a 
novelty of medical research papers, or a random survey 25 sequence of overlapping character 5-grams which starts "far 
sample of users (chosen from all users or a selected set of c", "or ex", "r exa", "exam", "examp", etc. The sentence 
experts), who could be asked to rate nearly anything. Certain may be characterized, imprecisely but usefully, by the score 
other types of evaluation, which also yield numeric of each possible character 5-gram ("aaaaa", "aaaab", . . . 
attributes, may be carried out mechanically. For example, "zzzzz") in the sentence. Conceptually speaking, in the 
the difficulty of reading a text can be assessed by standard 30 character 5-gram case, the textual attribute would be decom- 
procedures that count word and sentence lengths, while the posed into at least 26^=11,881,376 numeric attributes. Of 
vulgarity of a text could be defined as (say) the number of course, for a given target object, most of these numeric 
vulgar words it contains, and the expertise of a text could be attributes have values of 0, since most 5-grams do not appear 
crudely assessed by counting the number of similar texts its in the target object attributes. These zero values need not be 
author had previously retrieved and read using the invention, 35 stored anywhere. For purposes of digital storage, the value 
perhaps confining this count to texts that have high approval of a textual attribute could be characterized by storing the set 
ratings from critics. Finally, it is possible to synthesize of character 5-grams that actually do appear in the text, 
certain textual attributes mechanically, for example to rccon- together with the nonzero score of each one. Any 5-gram 
struct the script of a movie by applying speech recognition that is no t included in the set can be assumed to have a score 
techniques to its soundtrack or by applying optical character 40 of zero. The decomposition of textual attributes is not 
recognition techniques to its closed-caption subtitles. limited to attributes whose values are expected to be long 
Decomposing Complex Attributes texts. A simple, one-term textual attribute can be replaced by 
Although texmal and associative attributes are large and a collection of numeric attributes in exactly the same way. 
complex pieces of data, for information retrieval purposes Consider again the case where the target objects are movies, 
they can be decomposed into smaller, simpler numeric 45 The "name of director" attribute, which is textual, can be 
attributes. This means that any set of attributes can be replaced by numeric attributes giving the scores for 
replaced by a (usually larger) set of numeric attributes, and "Fedcrico-Fcllini," "Woody-Allen," 'Terence-Davies," and 
hence that any profile can be represented as a vector of so forth, in that attribute. For these one-term textual 
numbers denoting the values of these numeric attributes. In attributes, the score of a word is usually defined lo be its rate 
particular, a textual attribute, such as the full text of a movie 50 in the text, without any consideration of global frequency, 
review, can be replaced by a collection of numeric attributes Note that under these conditions, one of the scores is 1, 
that represent scores to denote the presence and significance while the other scores are 0 and need not be stored. For 
of the words "aardvaric," "aback," "abacus," and so on example, if Davies did direct the film, then it is "Terence- 
through "zymurgy" in that text. The score of a word in a text Davies" whose score is 1, since "Tbrence-Davies" consti- 
may be defined in numerous ways. The simplest definition is 55 mtes 100% of the words in the textual value of the "name of 
that the score is the rate of the word in the text, which is director" attribute. It might seem that nothing has been 
computed by computing the niunber of times the word gained over simply regarding the textual attribute as having 
occurs in the text, an d dividing this number by the total the string value "Terence-Davics." However, the trick of 
number of words in the text. This sort of score is often called decomposing every non-numeric attribute into a collection 
the "term frequency" (TF) of the word. The definition of 60 of numeric attributes proves useful for the clustering and 
term frequency may optionally be modified to weight dif- decision tree methods described later, which require the 
ferent portions of the text unequally: for example, any attribute values of different objects to be averaged and/or 
occunence of a word in the text's title might be counted as ordinally ranked. Only numeric atuibutes can be averaged or 
a 3-fold or more generally k-fold occurrence (as if the title ranked in this way. Just as a textual attribute may be 
had been repeated k times within the text), in order to reflect 6S decomposed into a number of component terms (letter or 
a heuristic assumption that the words in the title arc par- word n-grams), an associative attribute may be decomposed 
ticularly important indicators of the text's content or topic. into a number of component associations. For instance, in a 
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domain where the target objects are movies, a typical as described above into a coUeaion of reaJ numbers, rep- 

assodative attribute used in profiling a movie would be a list resenting the scores of various word n-grams or character 

of customers who have rented that movie. This list can be n-grams in the text. Then the value V may again be regandcd 

replaced by a collection of numeric attributes, which give as a vector, and the distance between two values is again 
the "association scores" between the movie and each of the 5 defined via the angle distance measure. Other similarity 

customers known to the system. For example, the 165th such metrics between two vectors, such as the dice measure, may 

numeric attribute would be the association score between the be used instead. It happens that the obvious alternative 

movie and customer #165, where the association score is metric, Euclidean distance, does not work well: even similar 

defined to be 1 if customer #165 has previously rented the texts tend not to overlap substantially in the content words 
movie, and 0 otherwise. In a subtler refinement, this asso- lo they use, so that texts encountered in practice are all 

ciation score could be defined to be the degree of interest, substantially orthogonal to each other, assuming that TF/IDF 

possibly zero, that customer #165 exhibited in the movie, as scores are used to reduce the influence of non-content words, 

determined by relevance feedback (as described below). As The scores of two words in a textual attribute vector may be 

another example, in a domain where target objects are correlated; for example, "Kennedy" aiKl "JFK" tend to 
companies, an associative attribute indicating the major is appear in the same documents. Thus it may be advisable to 

shareholders of the company would be decomposed into a alter the text somewhat before computing the scores of terms 

collection of association scores, each of which would indi- in the text, by using a synonym dictionary that groups 

cate the penientiage of the coinpany (possibly zero)' owned together similar words. The effect of this' optional pre- 

by some particular individual or corporate body. Just as with alteration is that two texts using related words are measured 
the term scores used in decomposing lengthy textual 20 to be as similar as if tbey had actually used the same words, 

attributes, each association score may optionally be adjusted One technique is to augment the set of words actually found 

by a multiplicative factor: for example, the association score in the article with a set of synonyms or other words which 

between a movie and customer #165 might be multiplied by tend to co-occur with the words in the article, so that 

the negated logarithm of the "global frequency" of customer "Kennedy" could be added to every article that mentions 
#165, i.e., the fraction of all movies that have been rented by 25 "JFK." Alternatively, words found in the article may be 

customer #165. Just as with the term scores used in decom- wholly replaced by synonyms, so that "JFK" might be 

posing textual attributes, most association scores found replaced by "Kennedy" or by "John F. Kennedy" wherever 

when decomposing a particular value of an associative it appears. In either case, the result is that documents about 

attribute are zero, and a amilar economy of storage may be Kennedy and documents about JFK are adjudged similar, 
gained in exactly the same manner by storing a list of only 30 The synonym dictionary may be sensitive to the topic of the 

those ancillary objects with which the target object has a document as a whole; for example, it may recognize that 

nonzero association score, together with their respective "crane" is likely to have a different synonym in a document 

association scores. that mentions birds than in a document that mentions 

Similarity Measures construction. A related technique is to replace each word by 

What does it mean for two target objects to be similar? 35 its morphological stem, so that "staple", "stapler", and 

More precisely, how should one measure the degree of "staples" are all replaced by "staple." Common function 

similarity? Many approaches are possible and any reason- words ("a", "and", "the" . . . ) c an influence the calculated 

able metric that can be computed over the set of target object similarity of texts without regard to their topics, and so are 

profiles can be used, where target objects are considered to typically removed torn the text before the scores of terms in 

be similar if the distance between their profiles is small 40 the text are computed. A more general approach to recog- 

according to this metric. Thus, the following preferred nizing synonyms is to use a revised measure of the distance 

embodiment of a target object similarity measurement sys- betweeri textual attribute vectors V and U, namely arccos 

tem has m any variations. (AV(AU)'/sqrt (AV(AV)' AU(AU)'), where the matrix A is 

First, define the distance between two values of a given the dimensionality-reducing linear transformation (or an 
attribute according to whether the attribute is a numeric, 45 approximation thereto) determined by collecting the vector 
associative, or textual attribute. If the attribute is numeric, values of the textual attribute, for all target objects known to 
then the distance between two values of the attribute is the the system, and applying singular value decomposition to 
absolute value of the difference between the two values. the resulting collection. The same approach can be applied 
(Other definitions are also possible: for example, the dis- to the vector values of associative attributes. The above 
tance between prices pi and p2 might be defined by |Opl- so definitions allow us to determine how close together two 
p2)|/(max(pl,p2)+l), to recognize that when it comes to target objects are with respect to a single atuibute, whether 
ctistomer interest, $5000 and S5020 are very similar, numeric, associative, or textual. The distance between two 
whereas $3 and $23 are not) If the attribute is associative, target objects X and Y with re^ct to their entire multi- 
then its value V may be decomposed as described above into attribute profiles Py and Py is then denoted d(X,Y) or d(?j^, 
a collection of real numbers, representing the association 55 Py) and defined as: 

scores between the target object in question and various (((distance with respect to attribute a)(weight of attribute 

ancillary objects. V may therefore be regarded as a vector a))*+((distance with respect to attribute bXweight of 

with components Vi, V2, V3, etc., representing the associa- attribute b))*+((distance with respect to attribute 

tion scores between the object and ancillary objects 1, 2, 3, c)(weight of attribute c))*+ . . . )* 

etc., respectively. The distance between two vector values V 60 where k is a fixed positive real number, typically 2,- and the 

and U of an associative attribute is then computed using the weights are non-negative real numbers indicating the rela- 

angle disUnce measure, arccos (VUVsqrt((V\0(UU')). (Note tive importance of the various attributes. For example, if the 

that the three inner products in this expression have the form target objects are consumer goods, and the weight of the 

^ °XiYi+X2Y2+X3Y3+ . . . , and that for eflScient "color" attribute is comparatively very small, then price is 

computation, terms of the form X,- Y,- may be omitted firom 6S not a consideration in determining similarity: a user who 

this sum if either of the scores X,- and Y,. is zero.) Finally, if likes a brown massage cushion is predicted to show equal 

the attribute is textual, then its vahie V may be decomposed interest in the same cushion manufactured in blue, and 
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vice-versa. On the other hand, if the we^t of the "color" computes the similarities between seUer-submitted profiles 

attribute is comparatively very high, then users are predicted and buyer-submitted profiles, and when two profiles match 

to show mtenat pnmarily m products whose colors they closely (i.e., the similarity is above a threshold), the corre- 

have liked m the past a brown massage cushion and a blue sponding seller and buyer are notified of each other's 

r„^l*,^^n^rr^h"/'?."Sr'°'%^^°^?'^*'°^j*"'' ' To prevent users from being flooded w[lh 

l^Zned^,nn.L^-^f ^ ''f ^^°f 'r"^^^^ «SP°"»^ " ■"^V ^ desirable to limit the number of 

witn one does not by itseli inspire much interest in the olhcr *•£ ** l • , ^^^w^* 

Target objects may be of various sorts. andTL some^^^ aotificatrons each user receives to a fixed number, such as 

advantageous to use a single system that is able to compare ^ u Hi i , 

tar get objects of distinct sorts. For example, in a system Feedback 

where some target objects are novels while other target ^ nltermg system is a device that can search through 
objects are movies, it is desirable to judge a novel and a ^^^^ ^^^^^ ^^^^^ estimate a given user's interest in 
movie similar if their profiles show that simflar users like ^^^"^ "^^j^^' ^ ^ ^ ^^^°^y ^^^^ S^^*^^ 
them (an associative attribute). However, it is important to ^^^^^^ system uses relevance feed 
note that certain attributes ^ecified in the movie's target ? knowledge of the user's interests: when- 
profile are undefined in the novel's target profile and vice ^^^f . filtering system identifies a target object as poteo- 
versa: a novel has no "cast lisr associative attribute and a tially mterestmg to a user, the user (if an on-line user) 
movie has no "reading lever numeric attribute. In general Prov^^^ feedback as to whether or not that target object 
a system in which target objects fall into distinct sorts may ^^^"^ ^ mterest. Such feedback'is stored" long-'ferm in"' 
sometimes have to measure the similarity of two target s^manzed form, as part of a database of user feedback 
objects for which somewhat different sets of attributes are »^fo"°ation, and may be provided either actively or pas- 
defined. This requires an extension to the distance metric ^'^^l^* ^° ^''^^^ feedback, the user explicitly indicates his or 
d(*.*) defined above. In certain appHcations. it is sufficient "^^V^^'. "^^^."^^ °° * f 7^ ^^'^'''^ "^^^^^^ 
when carrying out such a comparison simply to disregard ^^Vgl^ « ("^ special mterest) to 10 (great interest). In 
attributes that are not defined for both target objects- this ^^""^ feedback, the system infers the user's interest from 
aUows a cluster of novels to be matched with the most ^ ™ ^r's behavior. For example, if target objects are textual 
similar cluster of movies, for example, by considering only documents, the system might monitor which documents the 
those attributes that novels and movies have in common chooses to read, or not to read, and bow much time the 
However, while this method aUows comparisons between ^^^^ ^^^^ ^ ^^Pical formula for assessing 
(say) novels and movies, it does not define a proper metric ""^'^^ ^ document via passive feedback, in this domain, 
overthecombinedspaceofnovelsandmoviesandtherefore on a scale of 0 to 10, might be: 
does not allow chistering to be applied to the set of all target '•"^ ^ second page is viewed, 
objects. When necessary for clustering or other purposes, a +2 if all pages are viewed, 

metric diat allows comparison of any two target objects +2 if more than 30 seconds was spent viewing the 

(whether of the same or different sorts) can be defined as document, 

follows. If a is an attribute, then let MaxTa) be an upper :f . *i, ■ . . . , 

Kr.,.«H ^- ♦ u-*, . ^ +2 if more than one mmute was spent viewmc the 

boiind on the distance between two values of attribute a; document, 

notice that if attribute a is an associative or textual attribute, ^ r u 

this distance is an angle determined by arccos, so that +2 if the minutes spent viewing the document are greater 

Max(a) may be chosen to be 180 degrees, while if attribute ^ *° P^^es. 

a is a numeric attribute, a sufficiently large number must be " electromc mail messages, interest 

selected by the system designers. The distance between two f ^ ^ °^ ^ particularly 

values of attribute a is given as before in the case where both ^^""^^^ P^^^^^V Prompt reply. If the target objects are 

values are defined; the distance between two undefined P^f^^sable goods, interest pomts might be added for target 

values is taken to be zero; finaUy. the distance between a actually purchases, with further points 

defined vahie and an undefined value is always taken to be ^ ^^."^ °^ ^ large^uantity or high-price purchase. In any 

Max(a)/2. This aUows us to determine how close together ^^^r pomts might be added for target objects that 

two target objects are with re^ to an attribute a, even if user accesses early m a session, on the grounds that users 

attribute a does not have a defined value for both target ^^"^^ intcrtst them first. Other poten- 

objects. The distance d(*,*) between two target objects 4th ^^"^ ^^'^^ feedback include an electronic mea- 
respect to their entire multi-atttibute profiles is then given in ^° ^urement of the extent to which the user's pupils dilate while 

terms of these individual attribute distances exacUy as user views the target object or a description of the target 

before. It is assumed that one attribute in such a system ^ Possible to combine active and passive feedback, 

specifies the sort of target object ("movie'', "novel" etc > ^P^"*" is to take a weighted average of the two ratings, 

and that this attribute may be highly weighted if 'tar^t ^"^^^^ ^ P^"^ feedback by default, but to 

objects of different sorts are considered to be very different ^ ^ examine and actively modify the passive 

despite any attributes they may have in common. feedback score. In the scenano above, for instance, an 

umnteresting article may sometimes remain on the display 

imUZING THE SIMILARITY MEASUREMENT device for a long period whHe the user is engaged in 

Matching Buyers and Sellers unrelated business; the passive feedback score is then inap- 

A sunple appHcation of the simUarity measurement is a 60 propriately high, and the user may wish to correct it before 

system to match buyers with sellers in smaU-volume continuing. In the preferred embodiment of the invention, a 

markets, such as used cars and other used goods, artwork, or visual indicator, such as a sliding bar or indicator needle on 

employment. SeUers submit profiles of the goods (target the user's screen, can be used to continuously display the 

objects) they want to sell, and buyers submit profiles of the passive feedback score estimated by the system for the target 

goods (target objects) they want to buy. Participants may 65 object being viewed, unless the user has manuaUy adjusted 

submit or withdraw these profiles at any time. The system the indicator by a mouse operation or other means in order 

for customized electtonic identification of desirable objects to reflect a different score for this taiget object, after which 
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the indicator displays the active feedback score selected by U generally have in target objects like X. The method of 
the user, and this active feedback score is used by the system determining a user's interest reUes on the following heuris- 
instead of the passive feedback score. In a variation, the user tic: when X and Y are similar target objects (have similar 
cannot see or adjust the indicator untU just after the user has attributes), and U and V are similar useis (have similar 
finished viewing the target object. Regardless bow a user's 5 attributes), then topical interest f(U. X) is predicted to have 
feedback is computed, it is stored long-term as part of that a similar value to the value of topical interest f(V, Y). This 
user's target profile interest summary. heuristic leads to an efiEectivc method because estimated 

Fatermg: Determining Topical Interest Through Similarity values of the topical interest function f(*, *) are actually 
Relevance feedback only determines the user's interest in know n for certain arguments to that function: specificaUy, 
certain target objects: namely, the target objects that the user lO if user V has provided a relevance -feedback rating of r(V, Y) 
has actually had the opportunity to evaluate (whether for target object Y, then insofar as that rating represents user 
actively or passively). For target objects that the user has not V's Uiie interest in target object Y, we have r(V, Y)=q(V, 
yet seen, the filtering system must estimate the user's Y>f f(V, Y) and can estimate f(V, Y) as r(V, Y)^(V, Y)! 
interest This estimation task is the heart of the filtering Hius, the problem of estimating topical interest at all points 
problem, and the reason that the similarity measurement is 15 becomes a problem of interpolating among these estimates 
unportant. More concretely, the preferred embodiment of the of topical interest at selected points, such as the feedback 
filtering system is a news clipping service that periodicaUy cstimateof f(V,Y) as r(V.Y)-q(V,Y). This interpolation can 
presents the user with news articles of poteiitisd interest The be accomplished with any" standaixi* smoothing' techmque, " 
user provides active and/or passive feedback to the system using as input the known point estimates of the value of the 
relating to these presented articles. However, the system 20 topical interest function «[*,*), and determining as output a- 
does not have feedback information from the user for function that qjproximates the entire topical interest func- 
articles that have never been presented to the user, such as tion f(*, *). 

new articles that have just been added to the database, or old Not all point estimates of the topical interest function f(*, 
articles that the system chose not to present to the user. *) shouU be given equal weight as inputs to the smoothing 
Similarly, in the dating service domain where target objects 25 algorithm. Since passive relevance feedback is less reliable 
are prospective romantic partners, the system has only than active relevance feedback, point estimates made from 
received feedback on old flames, not on prospective new passive relevance feedback should be weighted less heavily 

than point estimates made from active relevance feedback. 
As shown in flow diagram form in FIG. 12. the evaluation or even not used at all. In most domains, a user's interests 
of the likeUhood of interest in a particular target object for 30 may change over time and, therefore, estimates of topic al 
a specific user can automatically be computed. The interest interest that derive from more recent feedback should also 
that a given target object X holds for a user U is assumed to be weighted more heavQy. A user's interests may vary 
be a sum of two quantities: q(U, X), the intrinsic "quafity" according to mood, so estimates of topical interest that- 
of X, phis f(U, X), the "topical interest" that users like U derive from the current session shouW be weighted more 
have in target objects tike X. For any target object X, the 35 heavily for the duration of the current session, and past 
mtrmsicquatity measure q(U,X) is easfly estimated at steps estimates of topical interest made at approximately the 
1201-1203 directly from numeric attributes of the target current time of day or on the current weekday should be 
object X. The computation process begins at step 1201. weighted more heavily. Finally, in domains where users are 
where certain designated numeric attributes of target object trying to locate target objects of long-term interest 
X are specifically selected, which attributes by their very 40 (investments, romantic partners, pen pals, employers, 
nature should be positively or negatively correlated with employees, suppHers, service providers) fix3m the possibly 
users' interest. Such attributes, termed "quaUty attributes," meager information provided by the target profiles, the users 
have the normative property that the higher (or in some cases are usuaUy not in a position to provide retiable immediate 
lower) their value, the more interesting a user is expected to feedback on a target object, but can provide retiable feed- 
find them. QuaUty attributes of target object X may include, 45 back at a later date. An estimate of topical interest f(V, Y) 
but arc not limited to, target object X's popularity among shouM be weighted more heavily if user V has had more 
users m general, tiie rating a particular reviewer has given experience with target object Y. Indeed, a useful strategy is ' 
target object X, the age (time since authorship— also known for the system to track long-term feedback for such target 
as outdatedness) of target object X, the number of vulgar objects. For example, if target profile Y was created in 1990 
words used in target object X, the price of target object X, 50 to describe a particular investment that was available in 
and the amount of money that the company selling target 1990, and that was purchased in 1990 by user V, then the 
object X has donated to the user's favorite charity. At step system solicits relevance feedback from user V in the years 
1202, each of the selected attributes is multiplied by a 1990, 1991, 1992, 1993, 1994, 1995, etc., and treats these as 
positive or negative weight indicative of the strength of user successively stronger indications of user V's true interest in 
U's preference for those target objects that have high values 55 target profile Y, and thus as indications of user V's likely 
for this attribute, which weight must be retrieved from a data interest in new investments whose current profiles resemble 
file storing quality attribute weights for the selected user. At the original 1990 investment profile Y In particular, if in 
step 1203, a weighted sum of the identified weighted 1994 and 1995 user V is well-disposed toward his or her 
selected attributes is computed to determine the intrinsic 1990 purchase of the investment described by target profile 
quality measure q(U, X), At step 1204, the summarized 60 Y, then in those years and later, the system tends to recom- 
weighted relevance feedback data is retrieved, wherein some mend additional investments when they have profiles like 
relevance feedback points are weighted more heavQy than target profile Y, on the grounds that they too will turn out to 
others and the stored relevance data can be summarized to be satisfactory in 4 to 5 years. It makes these recommen- 
some degree, for example by the use of search profile sets. dations both to user V and to users whose investment 
The more difficult part of determining user U's interest in 65 portfolios and other attributes are similar to user V's. The 
target object X is to find or compute al step 1205 the value relevance feedback provided by user V in this case may be 
of f(U, X), which denotes the topical interest that users tike either active (fecdback=satisfaction ratings provided by the 



03/09/2004, EAST Version: 1.4.1 



6,029,195 

21 22 

investor V) or passive (feedback^differenoe between aver- g(x)=inin(l, x"*) where k>l. Eslimate topical interest fTU 

age annual return of the investment and average annual X) with the following g-weighted average: 
return of the Dow Jones index portfialio since purchase of the 

inveamenUor example). W.n - n). g(disuBce<t(t/. x)A (v. n 

To effccuvcly apply the smoothing technique, it is nee- s £g(di.taac. m vmv n 

essary to have a definition of the similarity distance between ' 
(U, 30 and (V, Y), for any users V and V and any target 

objects X and Y. We have already seen how to define the summations are over all pairs (V, Y) such that 
distance d(Y, Y) between two target objects X and Y, given Y ^ provided feedbadc t(V, Y) on Urget object Y, i.e., 
their attributes. We may regard a pair such as (U, X) as an lO "'^ P**" ^' ^ ^'^^ relevance feedback r(V, Y) is 
extended object that bears all the attributes of target X and '^^^^^^ Note that both with this technique and with con- 
all the attributes of user U; then the distance between OJ X) yen" s™oothmg techniques, the estimate of the topical 
and (V. Y) may be computed in exacUy die same way. TOs "^^l^^h^^'J?, fT^^ ^""^ '° ^' 
approach requires user U, user V, and all other users to have W A^^'t^ is defined. 

some attributes of their own stored in the system- for 15 •^?L^:?h^*'"^ ^^^l"* '^^^^^^ 
f^c!,mt,h^ o-s 1 !>y!>i<:ni. loi 15 The method descnbed above requires the filtering system 

example age (numeric^ social security number (textual). to measure distances between (use^arget object) pL^ch 
and bs of documents previously retrieved associative). It is as ±e distance between (U, X) and (V; Y). Given thTm^S 
these attributes tha determine the notion of "simaar users." described earlier for measuri^ the diitince between tw\^- 
Thus it IS desirable to generate profiles of users (termed multi-attribute profiles, the method must therefore associate 

user profiles 0 as well as profiles of target objects (termed 20 a weight vrith each attribute used in the profile of (user 

target profiles ). Some attributes employed fiar profiling target object) pairs, that is, with each attribute used to profile 
users may be related to the attributes employed for profiling either users or target objects. These weights specify the 
target objects: for example, usmg associative attributes, it is relative importance of the attributes in establishiM similar- 
possible to characterize target objects such as X by the ity or difference, and therefore, in determining hot topical 
interest that vanous users have shown in them, and simul- 25 interest is generalized ftom one (user, target rf,ject) pair to 
taneously to charactcnze users such as U by the interest that another. Additional weights determine wUch atiibutes of a 
they have diown m vanous target objects. In addition, user target object contribute to the quality function q. and by how 
protles may make use of any attributes that are useful in much. } 
characterizing hurnans. such as those suggested in the It is possible and often desirable for a filtering system to 
exampte domain above where target objects are potential 30 store a different set of weights for each user For example 
consumers. NoIk* that user U's mterest can be estimated a user who thinks of two-star films as having materially 
even If user U js a new user or an off-line user who has never different topic and style from four-star films wants to assign 
provided any feedback, because the relevance feedback of a high weight to "number of stars" for purposes of the 
users whose attributes are similar to U's attributes is taken similarity distance measure dC •); this means that interest 
mio account. , . 35 in a two-star film does not necessarily signal interest in an 

J-or some uses of filtcnng systems, when estimating otherwise similar four^tar fikn. or vice-versa. If the user 
topical mterest it is appropriate to make an additional also agrees with the critics, and actually prefers four-star 

presumption of no tr^ical interest' (or "bias toward zero"). fibns, the user also wants to assign "number of stars" a high 
To understand the usefidness of such a presumption, suppose positive weight in the determination of the quality function 
Uie system needs to determme whether target object X is 40 q. In the same way. a user who dislikes vulgarity wants to 
topically mteresting to the user U, but that users Bke user U assign the "vulgarity score" attribute a high negative weight 
have riever provided fe^bade on target objects even in the determination of the quaUty function q. although toe 
remotely like target o^ect X. Hie presumption of no topical "vulgarity score" attribute does not necessarily have a high 
mterest says that if this is so. it is because users like user U weight in determining the topical similarity of two films 
are simply not interested m such target objects and therefore 45 Attribute weights (of both sorts) may be it or adjusted by 
do not seek them out and interact with them. On this the system administrator or the individual user, on either a 
presumption, the system should estunatc topical interest fi[U, temporary or a permanent basis 

^ !°m ^"""^y- "'™Pl<= ^ *c characteristic However, it is often desirable for the filtering system to 
Uiat (U X) K fiir away fiom all the pomts (V. Y) where kara attribute weights automaticaUy. based on relevance 
feedback IS avaikble In sucb a case, topical interest f(U.X) so feedback. Hie optimal attribute weights for a user U are 
is presumed to be close to zero, even if the vahie of the those that allow the most accurate prediction of user U's 
topical interest function •) is high at all the faraway interests. Hat is, with the distance measure and quality 
surroumling pomts at which its vahie is known. When a function defined by these attribute weights, user U's interest 
smoothmg technique is used, such a presumption of no in target object X, q(U, X)+fl;u, X), can be accurately 
topicalinterestcanbemtroduced,ifappropriate,bymanipu- 55 estimated by the techniques above. The effectiveness of a 
lating the mput to the smoothmg techmquc. In addition to particular set of attribute weights for user U can therefore be 
using observed values of the topical interest function t(\ •) gauged by seeing how w^n it predicts user U's known 
as mput, the tnck is to also introduce fake observations of interests 

the fom topical interest Uy, Y)-0 for a lattice of points (V, FomiaUy. suppose that user U has previously provided 
Y)distiibuted throughout Uie inultidmieDsional space. These 60 feedback on target objects X,. X^. X,. ... X,, and that the 
fake observations should be given relatively low weight as feedback ratings are r(U. X,). r(U. XJ. r(U. X,) rfU 
mputs to the sinoothing algorithm. The more strongly they XJ. Vahies of feedback ratings rC,») for other users and 
are weighted, the stronger the presumption of no interest. other target objects may also be known. Tlie system may use 
■Die foltowmg provides another simple example of an the following procedure to gauge the effectiveness of the set 
estunation techmque that has a presumption of no interest. 65 of attribute weights it curtenUy stores for user U- 0) For 
Let g be a decreasing fimction from non-negative real each l<=I<=n, use the estimation techniques to estimate 
numbers to non-negative real numbers, such as g(x)=c' or q(U, X>f(U, X^) from aU known vahies of feedbadc ratings 
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r. Call this estimate a,, (ii) Repeat step (i), but this time make 
the estimate for each l<=i<=n without using the feedback 
ratings r(U, X^) as input, for any j such that the distance dpQ, 
X^ is smaller than a fixed threshold. That is, estimate each 
q(U, X,)+fi(U, X;) from other values of feedback rating r 5 
only; in particular, do not use r(U, X,) itself Call this 
estimate b,-. The difference a,.-b,- is herein termed the "resi- 
due feedback r^/U, X^ of user U on target object X,- " (iii) 
Compute user U's error measure, (ai-bj)^+(a2-b2)^+{a3- 
b3)^ . . . +(a,.-bj^ 10 

A gradient-descent or other numerical optimization 
method may be used to adjust user U's attribute weights so 
that this error measure reaches a (bcal) minimum. This 
approach tends to work best if the smoothing technique used 
in estimation is such that the value of fi(V, Y) is strongly 15 
affected by the point estimate t(V, Y)-q(V, Y) when the latter 
value is provided as input. Otherwise, the presence or 
afeence of the single inpiit feedback rating r(U, X^), in steps 
(i>-(ii) may not make a, and b,- very different from each 
other. A slight variation of this learning technique adjusts a 20 
single global set of at tribute weights for all users, by 
adjusting the weights so as to minimize not a particular 
user's error measure but rather the total error measure of all 
users. These global weights are used as a default initial 
setting for a new user who has not yet provided any 25 
feedback. Gradient descent can then be employed to adjust 
this user's individual weights over time. Even when the 
attribute weights are chosen to minimize the error measure 
for user U, the error measure is generally still positive, 
meaning that residue feedback from user U has not been 30 
reduced to 0 on all target objects. It is useful to note that high 
residue feedback from a user U on a target object X indicates 
that user U liked target object X unexpectedly well given its 
profile, that is, better than the smoothing model could 
predict from user U's opinions on target objects with similar 35 
profiles. Similarly, low residue feedback indicates that user 
U liked target object X less than was expected. By definition, 
this unexplained preference or disprefcrcnce cannot be the 
result of topical similarity, and therefore must be regarded as 
an indication of the intrinsic quality of target object X. It 40 
follows that a useful quality attribute for a target object X is 
the average amount of residue feedback r„,(V, X) from iisers 
on that target object, averaged over all users V who have 
provided relevance feedback on the target object. In a 
variation of this idea, residue feedback is never averaged 4S 
indiscriminately over all users to form a new attribute, but 
instead is smoothed to consider users' similarity to each 
other Recall that the quality measure q(U, X) depends on the 
user U as well as the target object X, so that a given target 
object X may be perceived by different users to have 50 
different quahty. In this variation, as before, q(U, X) is 
calculated as a weighted sum of various quality attributes 
that are dependent only on X, but then an additional term is 
added, namely an estimate of r„, (U, X) found by applying 
a smoothing algorithm to known values of rres (V, X). Here 55 
V ranges over all users who have provided relevance feed- 
back oo target object X, and the smoothing algorithm is 
sensitive to the distances d(U, V) from each such user V to 
user U. 

Using the Similarity Computation for Qustering 60 

A method for defining the distance between any pair of 
target objects was disclosed above. Given this distance 
measure, it is simple to apply a standard clustering 
algorithm, such as k-means, to group the target objects into 
a number of clusters, in such a way that similar target objects 6S 
tend to be grouped in the same cluster. It is clear that the 
resulting clusters can be used to improve the eflScicncy of 
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matching buyers and sellers in the application described in 
section "Matching Buyers and Sellers" above: it is not 
necessary to compare every buy profile to every sell profile, 
but only to compare buy profiles and sell profiles that are 
similar enough to appear in the same cluster As explained 
below, the results of the clustering procedure can also be 
used to make filtering more eflScicnt, and in the service of 
querying and browsing tasks. 

The k-means clustering method is familiar to those skilled 
in the art. Briefly put, it finds a grouping of points (target 
profiles, in this case, whose numeric coordinates are given 
by numeric decomposition of their attribiUes as described 
above) to minimize the distance between points in the 
clusters and the centers of the clusters in which they are 
located. This is done by alternating between assigning each 
point to the cluster which has the nearest center and then, 
once the points have been assigned, computing the (new) 
center of each cluster by averaging the coordinates of the" 
points (target profiles) located in this chister Other cluster- 
ing methods can be used, such as "soft" or "fuzzy" k-means 
clustering, in which objects are allowed to belong to more 
than one cluster. This can be cast as a clustering problem 
similar to the k-means problem, but now the criterion being 
optimized is a little different: 

where C ranges over cluster numbers, i ranges over target 
objects, Xf is the numeric vector corresponding to the profile 
of target object number i, _C is the mean of all the numeric 
vectors corresponding to target profiles of target objects in 
cluster number C, termed the "cluster profile" of cluster C, 
d(*, *) is the metric used to measure distance between two 
target profiles, and i.-^ is a value between 0 and 1 that 
indicates how much target object number i is associated with 
cluster niunber C, where i is an indicator matrix with the 
property that for each i, SUM SUB C I SUB iC«l. For 
k-means clustering, i^ is either 0 or 1. 

Any of these basic types of clustering might be used by 
the system: 

1) Association-based clustering, in which profiles contain 
only associative attributes, and thus distance is defined 
entirely by associations. This kind of clustering gener- 
ally (a) clusters target objects based on the similarity of 
the users who like them or (b) clusters users based on 
the similarity of the target objects they like. In this 
approach, the system docs not need any information 
about target objects or users, except for their history of 
interaction with each other 

2) Content-based clustering, in which profiles contain 
only non-associative attributes. This kind of clustering 
(a) chisters target objects based on the similarity of 
their non-associative attributes (such as word 
frequencies) or (b) clusters users based on the similarity 
of their non-associative attributes (such as demograph- 
ics and psychographics). In this approach, the system 
does not need to record any information about users' 
historical patterns of information access, but it does 
need information about the intrinsic properties of users 
and/or target objects. 

3) Uniform hybrid method, in which profiles may contain 
both associative and non-associative attributes. This 
method combines la and 2a, or lb and 2b. The distance 
d(P;^ Py) between two profiles Py and may be 
computed by the general similarity-measurement meth- 
ods described earlier. 
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4) Sequential hybrid method. First apply the k-means 
procedure to do la, so that articles are labeled by 
cluster based on which user read them, then use super- 
vised clustering (maximum likelihood discriminant 
methods) using the word frequencies to do the process 
of method 2a described above. This tries to use knowl- 
edge of who read what to do a better job of clustering 
based on word frequencies. One could similarly com- 
bine the methods lb and 2b described above. 
Hierarchical clustering of target objects is often useful. 
Hierarchical clustering produces a tree which divides the 
target objects first into two large clusters of roughly similar 
objects; each of these clusters is in turn divided into two or 
more smaller clusters, which in turn arc each divided into yet 
smaller clusters until the collection of target objects has been 
entirely divided into "clusters" consisting of a single object 
each, as diagrammed in FIG. 8 In this diagram; the node d 
denotes a particular target object d, or equivalently, a single- 
member cluster consisting of this target object. Target object 
d is a member of the cluster (a, b, d), which is a subset of 
the cluster (a, b, c, d, e, f), which in mm is a subset of all 
target objects. The tree shown in FIG. 8 would be produced 
from a set of target objects such as those shown geometri- 
cally in FIG. 7, In FIG. 7, each letter represents a target 
object, and axes xl and x2 represent two of the many 
niuneric attributes on which the target objects differ. Such a 
cluster tree may be created by hand, using human judgment 
to form clusters and subclusters of similar objects, or may be 
created automatically in either of two standard ways: top- 
down or bottom-up. In top-down hierarchical clustering, the 
set of all target objects in FIG. 7 would be divided into the 
clusters (a, b, c, d, e, f) and (g, h, i, j, k). The clustering 
algorithm would then be reapplied to the target objects in 
each duster, so that the cluster (g, h, i, j, k) is subpartitioned 
into the clusters (g, k) and (h, i, j), and so on to arrive at the 
tree shown in FIG. 8. In bottom-up hierarchical chistering, 
the set of all target objects in FIG. 7 would be grouped into 
munerous small clusters, namely (a, bX d, (c, f), e. (gjc), (h, 
i), and j. These clusters would then themselves be grouped 
into the larger clusters (a, b, d), (c, e, f), (g, k), and (h, i, j), 
according to their cluster profiles. These larger clusters 
would themselves be grouped into (a, b, c, d, e, f) and (g, k, 
h, i, j), and so on until all target objects had been grouped 
together, resulting in the tree of FTG. 8. Note that for 
bottom-up clustering to work, it must be possible to apply 
the clustering algorithm to a set of existing clu^ers. This 
requires a notion of the distance between two clusters. The 
method disclosed above for measuring the distance between 
target objects can be applied directly, provided that clusters 
are profiled in the same way as target objects. It is only 
necessary to adopt the convention that a cluster's profile is 
the average of the target profiles of all the target objects in 
the cluster; that is, to determine the cluster's value for a 
given attribute, take the mean value of that attribute across 5S 
all the target objects in the duster. For the mean value to be 
well-defined, all attributes must be numeric, so it is neces- 
sary as usual to replace each textual or associative attribute 
with its decomposition into nxmieric attributes (scores), as 
described earlier For example, the target profile of a single 
Woody Allen fihn would assign ''Woody-Allen" a score of 1 
in the "name-of-director'* field, while giving "Federico- 
Fellini** and "Terence-Davies" scores of 0. A cluster that 
consisted of 20 films directed by Allen and 5 directed by 
Fellim would be profiled with scores of 0.8, 0.2, and 0 
respectively, because, for example, 0.8 is the average of 20 
ones and 5 zeros. 
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Searching for Target Objects 

Given a target object with target profile P, or alternatively 
given a search profile P, a hierarchical cluster tree of target 
objects makes it possible for the system to search efficiently 
for target objects with target profiles similar to P. It is only 
necessarily to navigate through the tree, automatically, in 
search of such target profiles. The system for customized 
electronic identification of desirable objects begins by con- 
sidering the largest, top-level clusters, and selects the cluster 
whose profile is most similar to target profile P. In the event 
of a near-tie, multiple clusters may be selected. Next, the 
system considers all subclusters of the selected clusters, and 
this time selects the subclusters or subclusters whose profiles 
are closest to target profile P. This refinement process is 
iterated imtil the clusters selected on a given step are 
suffidently small, and these are the desired clusters of target 
objects with profiles most similar to target profile P. Any 
hierarchical cluster tree therefore serves 'is "a"decisi6n tree 
for identifying target objects. In pseudo-code form, this 
process is as follows (and in flow diagram form in RGS. 
13A and 13B): 

1. Initialize list of identified target objects to the empty list 
at step 13A00 

2. Initialize the current tree T to be the hierarchical cluster 
tree of all objects at step 13Adl and at step 13A02 scan 
the current cluster tree for target objects similar to P, 
using the process detailed in FIG. 13B. At step 13A03, 
the list of target objects is retumed. 

3. At step 13B00, the variable I is set to 1 and for each 
child subtree Ti of the root of tree T, is retrieved. 

4. At step 13B02, calculate d(P, p^, the similarity distance 
between P and p^, 

5. At step 13B03, if d(P, p^<t, a threshold, branch to one 
of two options 

6. If tree H contains only one target object at step 13B04, 
add that target object to list of identified target objects 
at step 13B05 and advance to step 13B07. 

7. If tree Ti contains multiple target objects at step 13B04. 
scan the ith child subtree for target objects similar to P 
by invoking the steps of the process of FIG. 13B 
recursively and then recurse to step 3 (step 13A01 in 
FIG. 13A) with T bound for the diu^tion of the recur- 
sion to tree Ti, in order to search in tree Ti for target 
objects with profiles similar to P. 

In step 5 of this pseudo-code, smaller thresholds are 
typically used at lower levels of the tree, for example by 
making the thrcshold an affine function or other function of 
the cluster variance or cluster diameter of the cluster p,-. If 
the cluster tree is distributed across a plurality of servers, as 
described in the section of this description titled "Network 
Context of the Browsing System", this process may be 
executed in distributed fashion as follows: steps 3-7 are 
executed by the server that stores the root node of hierar- 
chical cluster tree T, and the recursion in step 7 to a 
subcluster tree T^ involves the transmission of a search 
request to the server that stores the root node of tree T,-, 
which server carries out the recursive step upon receipt of 
this request. Steps 1-2 are carried out by the processor that 
initiates the search, and the server that executes step 6 must 
send a message identifying the target object to this initiating 
processor, which adds it to the list. 

Assuming that low-level clusters have been already been 
formed through clustering, there are alternative search meth- 
ods for identifying the low-level cluster whose profile is 
most similar to a given target profile P. A standard back- 
propagation neural net is one such method: it should be 
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trained to take the altributes of a target object as input, and inconvenience people, however, it is important not to deter- 
produce as output a unique pattern that can be used to mine aU difficult attributes this way, but only the ones that 
identify the appropriate low-level cluster For maximum are most important is classifyng the article. "Rapid profil- 
accuracy, low-level chisters that are similar to each other ing" is a method for selecting those numeric attributes that 
(close together in the cluster tree) should be given similar 5 are most important to determine. (Recall that all attributes 
identifying patterns. Another approach is a standard decision can be decomposed into numeric attributes, such as asso- 
trcc that considers the attributes of target profile P one at a ciation scores or term scores.) First, a set of existing target 
time unlfl it can identify the appropriate chister. If profiles objects that already have complete or largely complete 
are large, this may be more rapid than considering all profiles are clustered using a k-means algorithm. Next, each 
attributes. A hybrid approach to searching uses distance lO of the resulting clusters is assigned a unique identifying 
measurements as described above to navigate through the number, and each clustered target object is labeled with the 
top few levels of the hierarchical cluster tree, until it reaches identifying number of its cluster. Standard methods then 
an cluster of intermediate size whose profile is similar to allow construction of a single decision tree that can deter- 
target profile P, and then continues by using a dedsion tree mine any target object's chister number, with substantial 
* specialized to search for low-level subclusters of diat inter- is accuracy, by considering the attributes of the target object, 
mediate cluster. one at a time. Only attributes that can if necessary be 
One use of these searching techniques is to search for determined for any new target object are used in the con- 
target objects that match a search profile from a user's search' ' ' structi6n"of this decision tree. To profile a new target object, 
profile set. This form of searching is used repeatedly in the the decision tree is traversed downward from its root as far 
news chpping service, active navigation, and Virtual Com- 20 as is desired. The root of the decision tree considers some 
munity Service applications, described below. Another use is attribute of the Urget object. If the value of this attribute is 
to add a new urget object quickly to the cluster tree. An not yet known, it is determined by a method appropriate to 
existmg chister that is similar to the new target object can be that attribute; for example, if the attribute is the association 
located rapidly, and the new Urget object can be added to score of the target object with user #4589, then relevance 
this cluster. If the object is beyond a certain threshold 25 feedback (to be used as the value of this attribute) is solicited 
disUnce from the cluster center, then it is advisable to sUrt from user #4589, perhaps by the ruse of adding the possibly 
a new chister. Several variants of this incremental clustering uninteresting target object to a set of objects that the system 
scheme can be used, and can be built using variants of recommends to the user's attention, in order to find out what 
subroutines available in advanced statistical packages. Note the user thinks of it. Once the root attribute is determined, 
that various methods can be used to locate t he new Uiget 30 the rapid profiling method descends the decision tree by one 
objects that must be added to the cluster tree, depending on level, choosing one of the decision subtrees of the root in 
the architecture used. In one method, a "webcrawler" pro- accordance with the determined value of the root attribute, 
gram running on a central computer periodicaUy scans aU The root of this chosen subtree considers another attribute of 
servers in search of new target objects, calculates the Urget the Urget object, whose value is likewise determined by an 
profiles of these objects, and adds them to the hierarchical 35 appropriate method. The process c an be repeated to deter- 
cluster tree by the above method. In another, whenever a mine as many attributes as desired, by whatever methods are 
new t^ct object is added to any of the servers, a software available, although it is ordinarily stopped after a small 
"agenf* at that server calculates the target profile and adds number of attributes, to avoid the burden of determining too 
it to the hierarchical cluster tree by the above method. many attributes. 

Rapid Profiling 40 u should be noted that the rapid profiling method can be 

In some domains, complete profiles of target objects are used to identify important attributes in any sort of profile, 

not always easy to constmcl automaticaUy. When Urget and not just profiles of Urget objects. In particular, recall that 

objects are multimedia, for example, an attribute such as the disclosed method for determining topical interest 

"genre" (a single textual term such as "Action", "Suspense/ through similarity requires users as weU as target objects to 

Thriller", "Word Gamcs7etc.) may be a matter of judgment 45 have profiles. New users, like new target objects, may be 

and opinion, difficult to determine except by consulting a profiled or partially profiled through the rapid profiHng 

human. More significantly, if each tide has an associated process. For example, when user profiles include an asso- 

attribute that records the positive or negative relevance ciative attribute that records the user's relevance feedback 

feedback to that title from various human users (consumers) on all Urget objects in the system, the rapid profiling 

then all the association scores of any newly introduced tiUe 50 procedure can rapidly form a rough characterization of a 

are initiaUy zero so that it is initially unclear what other titles new user's interests by soUdting the user's feedback on a 

are similar to the new tide with re^ject to the users who like smaU number of significant target objects, and perhaps also 

thenn. Indeed, if this associative attribute is highly weighted, by determining a small number of other key attributes of the 

the initial lack of relevance feedback information may be new user, by on-line queries, telephone surveys, or other 
difficult to remedy, due to a vicious circle in which users of 5S means. Once the new user has been partially profiled in this 

moderate-to-high interest are needed to provide relevance way, the methods disclosed above predict that the new user's 

feedback but relevance feedback is needed to identify users interests resemble the known interests of other users with 

of moderate-to-high interest. similar profiles. In a variation, each user's user profile is 

Fortunately, however, it is often possible in principle to subdivided into a set of long-term attributes, such as demo- 
determine certain attributes of a new tar^t object by 60 graphic characteristics, and a set ofshort-term attributes that 

extraordinary methods, including but not limited to methods help to identify the user's temporary desires and emotional 

that consult a human. For example, the system can in sUte, such as the user^s textual or multiple-choice answers 

principle determine die genre of a tide by consulting one to questions whose answers reflect the user's mood. A subset 

more randomly chosen individual from a set of human of the user's long-term attributes are determined when the 
ejqjerts, whfle determining the score between a new title and 65 user first registers with the system, through the use of a rapid 

a particular user it can in principle show the tide to that user profiting tree of long-term attributes. In addition, each time 

and determine relevance feedback Since such requests the user logs on to the system, a subset of the user's 
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short-term attributes are additionally determined, through tive measure can be determined through averaging of results 
the use of a separate rapid profiling tree that asks about across all participating users on an attribute specific basis, 
short-term attributes. Using the techniques described above which allow for 

Maricet Research pseudonymous credentialing of users or organizations by 

A technique similar to rapid profiling is of interest in 5 other entities, these evaluation based attributes may be 
market research (or voter research). Suppose that the target automatically ascribed to each product in the form of 
objects arc consumers. A particular attribute in cadi target credentials, also manually ascribed comments or descrip- 
profile indicates whether the consumer described by that tions may be (provided and subsequently rated by other 
target profile has purchased product X. A decision tree can users) to further leverage consumer participation in adding 
be built that attempts to determine what value a consumer lo characterization attributes to a given product's or entities 
has for this attribute, by consideration of the other attributes profile. These averaged consumer rating based credentials 
in the consumer's profile. This decision tree may be tra- also act as a means of normalizing biased opinions or rogue 
versed to determine whether additional users are likely to attempts to defame a product or entity and thus are used to 
purchase product X. More generally, the top few levels of substantiate claims which consumers have provided and 
the decision tree provide information, valuable to advertisers is other consumers have substantiated cither in the form of 
who arc planning mass-market or direct-mail campaigns, on-line or off-line advertisements and coupons. Comparative 
about the most significant characteristics of consumers of ratings of competitive products arc achievable by targeting 
product X. ' " • "~ usees which have experience with (two or more) products ' 

Similar information can alternatively be extracted firom a being compared. The most relevant attributes which both 
collection of consumer profiles without recourse to a ded- 20 products share are presented using these rapid profiling 
sion ttee, by considering attributes one at a time, and techniques. In order to develop a truly robust statistically 
identifying those attributes on which product X's consumers confident comparison across all products on an attribute by 
differ significantly firom its non-consumers. These tech- atuibute basis, it is important to use this comparative prod- 
niques serve to charaaerize consumers of a particular prod- uct rating approach, to identify automatically which product 
uct; they can be equally well applied to voter research or 25 comparisons are most statistically relevant in order to pro- 
other survey research, where the objective is to characterize vide statistical confidence for all products being evaluated 
those individuals firom a given set of surveyed individuals (in this comparative product context) to validation of the 
who favor a particular candidate, hold a particxdar opinion, values of each attribute using different combinations of 
belong to a articular demographic group, or have some other product comparisons is important in order to assure statis- 
set of distinguishing attributes. Researchers may wish to 30 tical confidence (between different users). These rated 
purchase batches of analyzed or unanalyzed user profiles attt-ibute credentials may also be segmented by user types 
from which personal identifying information has been using knowledge discovery techniques. For example, it is 
removed. As with any statistical database, statistical conclu- possible that users of a certain demographic, product affinity 
sions can be drawn, and rclationsh^s between attributes can or other attribute type may have different preferences 
be elucidated using knowledge discovery techniques which 35 demands or expectations, thus may evaluate a product's 
arc well known in the art. overall quality or vahic (or other product attribute) differ- 

CONSUMER-BASED BETTER BUSINESS ^^^^^^^^^^^ credentials may be provided as 

BUREAU resolution credentials, for example in combination with a 

credential provided by a neutral third party which proves 
In the case of profiling new products, a decision tree may 40 that the user is in good standing with its customers (that a 
be useful for determining its profile quickly (for example if "significant" number of complaints were not . submitted), 
certain general attributes are known about the product). Brokerage exchanges which match buyers and sellers and/or 
Rapid profiling may also be used to automatically present a act as a directory thereof may wish lo apply these techniques 
selection of attributes (of at least two) with which a user in order to provide users with some unbiased feedback from 
selects which attribute most apdy describes the product 45 peers about products and services being solicited peer to 
and/or provides a weighted vahie of its relevance thereto. peer rating based resolution credentials. It is also possible to 
Alternatively, the decision tree presents (for each node) at automatically present a set of survey questions to a group of 
least one exemplar item which the user rates indicating the users who have been previously interacting on-line with 
degree of similarity between the system presented item(s) another user. Because of the subjective nattire involved in 
and the new item of interest Additionally, for the sake of 50 characterizing individuals based upon their personal, or even 
optimizing the confidence of the users being surveyed, the professional proficiencies and weaknesses, human involve- 
dedsion tree may also identify the user whose profiles ment in providing manual characterizations of a sample of 
suggest the greatest degree of similarity with the attributes users is necessary. The nature of the interaction (an 
or items being presented as queries. In one variation in this associate, professional, personal, or social) may be deter- 
regard, the system selects users which are most familiar with ss mined through automatic means (based on the content 
two or more competitive products. The system performs a profiles of dialogues and lists of "similar^ users which they 
rapid profiling of these users, however, for product atdibutes interact with) in order to automatically ascribe an associative 
which arc most relevant to both products (which is produced attribute which identifies both other individuals, his/her 
from the result of combining or averaging both product relationship with the user and the nature of their interaction, 
profiles). Example attributes which are most telling about 60 Individuals may be automatically presented . with targeted 
thexiser'sperceptionof comparative vahie and quality when questions appropriate to the nature thereof in accordance 
making a selection may include: performance, aesthetics, with their muttial relationship through anticipation of which 
comfort, convenience of use, value, overall satisfaction, attributes or queries other individuals (like friends, 
personal preference, as well as other relevant specific prod- assodates, business parmers or employers) are most likely 
uct attributes which may be determined as a part of the 65 to request in the future. These questions are ideally 
user's profile. By applying this technique over multiple requested from multiple users, their values arc then averaged 
product brands within a given category, a relative, compara- and may be ascribed to that user as resolution credentials. In 
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case of dilutes mediation by a judicating third party may be profiling of target objects in this complex domain may be 

required. Additionally, the system may fiirthcr anticipate the further enhanced by establishing exception in the form of 

types of questions which are most likely to be requested by special appropriateness function rales between the textual 

other users m the fii««rc- Tt>s approach may also be used by descriptive, and numeric attributes of those Urgeted objects' 
the system to profile skills sets, qualifications, issues of 5 (e^g. the quaUfication of the users, the textual attributes in 

personahly, character or qualification to perform a particular the description of each task, and the evahiative desoiption 

task. It may also direct queries to the users most likely to be of the recipients of the task solutions provided. As in other 

quahfied knowledgeable m certain popular domains, which informational domains, the exception rules which apply to a 

are most hkely to be relevant (and thus anticipate the types particular domain are given priority over those which apply 
of quenes that other users are likely to request. Similarly, 10 to another domain. (Again, where cross correlation statistits 

users may be used to answer questions or provide descrip- are given second priority in order to maximize statistical 

tive characterizations of certain tasks or queries using rapid confidence). Such exception rules may include (but are not 

profihi^g m this way as weU. Thu^ tasks, (consulting on the Umited to) giving specM relevance between a word attribute 

mtemet, intranet, etc.) may be profiled according to the types based upon the sequence in which those textual attributes 
of users who ascribe, subjective, or objective attributes to 15 appear io the description, (or in the presence or absence of 

best describe the task, or attributes may be ascribed which a numeric attribute in combination with a numeric attribute 

charactenze the most appropriate individuals according to or a textual attribute). (These associations may also be based 

their professional qualifications or other relevant attributes, 6n their relative frequendes-in the text as well) or more 

such as the tasks which they have successfully performed. complex rules may be established autoinatically. 
Accordu^ly. task attributes may also be conveyed to the 20 Furthermore, if the combination of words appear, and the 

best candidates to whom these tasks are directed. As request is from a particular user it is likely that a particular 

suggested, ta^ performance may be manually evaluated in detailed target profile is appropriate for the target object. By 

order to provide the system with a source of performance definition, exception rules apply exceptions in the weightiiw 

bwed relevance feedback. The users who submitted the task values of attributes or an attribute with an exceptton k 
offeis are ^ven the opportunity to provide an evaluation of 25 present (or at least one of) at least three attributes which are 

the level of the quahty ot the work (or query response) as present in a particular (user or Uiget object) profile whose 

weU as weraU satisfaction regarding the response to the attribute weighting influence upon another attribute would 

requestoffer-Therequestermayprovideanevaluationinthe not otherwise be recognized in a pure (non-rule based) 

form of a set of feedback comments. Additionally, the rapid statistical model (customized) profiles of requests which is 
profiling techmque will automaUcaUy generate a set of the 30 specific to each user may be used as each user may submit 

most relevant attnbutes in die form of a survey which allow simflar requests in a different descriptive manner (with 

the user to rate the attnbutes according to each relevant varying word usage). The user's needs may also vary based 

attribute parameter as perceived by the user. (These upon the context of what actions the user has recently 

attributes may. of course, mclude those which are humanly performed e.g., searching through particular topics of the 

ascribed as well). Unlike the method for automatic query 3s World VWde Web, searching through e-mail, conversing with 

rouhng the current system for finding optimal user skiU particular users about a particular topic of engaging in these 

profiles to match the particular submitted task description, activities at certain times or in conjunction wittT^y of the 

the current sy^em potentially embodies a much more com- above which may indicate the context of the user's mode of 

plex knowledge construction requiring predsion-oriented activities such as work, leisure or academics. If a particular 

statuihcal knowledge about the nature of the user's nmner- 40 combination of words appears and it is from a particular 

ous skill sets and the submitted tasks. „q„est as part of the description of a requesi from a 

It may be very useful to use associative attributes to particular individual, the relevance of each attribute com- 

identify the relevant words in the task description and users ponent of the request may be different to some degree than 

who suocessfuUy provided solutions and reqwnses to simi- the request from a different individual (wherein this case 

larly described tasks in the past According to the previously 45 these exception rules are relevant to particuUr users) 

dKcnbed techniques of the patent, the collection of target Accordingly, the sequence of words which appear (for a 

objects m this particular mformatioo domain include task particular word combination) may be suggestive of the 

descr^jtions; solutions to the requests, individuals who have relative importance of particular words to one another or to 

provided solutions to those tasks, individuals whose profiles a particular solution or a particular individual. Accordingly 

qualify them for solving particular problem types, and so in the application to matching queries or tasks with users 

mdmduab who are most likely to have a need for solution according to their qualifications for the particular combina- 

to a particular type of problem. As suggested each of these tion of qualifying credentials which a user possesses may 

types of target objects may constinite the information space indicate an exception rule either between particular 

of the presently described system for customized electronic credentials, between credentials and individual tasks (or 

identification of desirable objects. Thus in order to augment 55 between credentials and textual attributes in the text of task 

the search retrieval process the user may also be directed to descriptions). Exception rules are not applicable for asso- 

potentially useful information through, menu browsing and ciative attributes which associate target objects users (or 

search query navigation (and nearest neighbor, target object both) via the present similarify based techniques, 
to target object) navigation down or across the menu as well 

as the current matching of appropriate users with requests 60 SUPPORTING ARCHITECTURE 

are herein described. Accordingly, as relevant in the other The following section describes the preferred computer 

mformaaonal domains (if the target object profiles) and the and network architecture for implementing the methods 

similanty between target objects is not sUtistically confident described in this patent. 

the system wiU cross correlate the statistical data from other Hectronic Media System Architecture 

informatfonal domains in order to assign the most appropri- 6S FIG. 1 illustrates in block diagram form the overall 

ate profile for each of target object for which a sparse data architecture of an electronic media system, known in the art, 

problem currenUy exists. In a more advanced embodiment, in which the ^tem for customized electronic identification 
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of desirable objects of the presenl ioveatioii can be used to excerpt a segment of the information that may be relevant to 
provide user customized access to target objects that are the user from the plethora of information that is generated 
available via the electronic media system. In particular, the and populated on this system. Even if the user commits the 
electronic media system comprises a data communication necessary resources to this task, existing information 
facility that interconnects a plurality of users with a number 5 retrieval processes lack the accuracy and cfiBciency to ensure 
of information servers. The users are typically individuals, that the user obtains the desired information. It is obvious 
whose personal computers (terminals) T^-T^ are connected that within the constructs of this electronic media system, 
via a data communications link, such as a modem and a the three modules of the system for customized electronic 
telephone connection established in well-known fashion, to identification of desirable objects can be implemented in a 
a telecommunication network N. User information access lO distributed manner, even with various modules being imple- 
software is resident on the user's personal computer and mented on and/or by difiFerent vendors within the electronic 
serves to communicate over the data communications link media system. For example, the information servers Ij-I„ 
and the telecommunication network N with one of the can include the target profile generation module while the 
plurality of network vendors V^_^ (America Online, network vendors V^-V,. may implement the user profile 
Prodigy, CompuServe, other private companies or even is generation module, the target profile interest summary gen- 
universities) who provide data interconnection service with cration module, and/or the profile processing module. A 
selected ones of the information servers Ij— I^. The user can, module can itself be implemented in a distributed manner 
by use of the user information access software, interact with with nunderous nodes being present in the network N, each 
the informat wn servers to request and obtain access to node serving a population of users in a particular geographic 
data that resides on mass borage systems -SS„ that are part 20 area. The totahty of these nodes comprises the functionality 
of the information server apparatus. New data is input to this of the particular module. Various other partitions of the 
system y users via their personal computers Ty-T„ and by modules and their funaions are possible and the examples 
commercial information services by populating their mass provided herein represent illustrative examples and are not 
storage systems SS^-SS^ with commercial data. Each user intended to limit the scope of the claimed invention. For the 
terminal T^-T„ and the information servers have 25 purposes of pseudonymous creation and update of users' 

phone numbers or IP addresses on the network N which target profile interest summaries (as described below), the 
enable a data communication link to be established between vendors Vi-Vjt may be augmented with some number of 
a particular user terminal Tj-T^ and the selected information proxy servers, which provide a mechanism for ongoing 
server Ii-I„. A user's electronic mail address also uniquely pseudonymous access and profile building through the 
identifies the user and the user's network vendor V^-Vj^ in 30 method described herein. At least one trusted validation 
an industry-standard format such as: usemame@aol.com or server must be in place to administer the creation of pseud- 
usemame@netoom.com. The network vendors V^-Vj^ pro- onyms in the system. 

vide access passwords for their subscribers (selected users), An important characteristic of this system for customized 
through which the users can access the information servers electronic identification of desirable objects is its 
h'-^m' The subscribers pay the network vendors V^-Vj^ for 35 responsiveness, since the intended use of the system is in an 
the access services on a fee schedule that typically includes interactive mode. The system utility grows with the number 
a monthly subscription fee and usage based charges. A of the users and this ncrcases the number of possible 
difficulty with this system is that there arc numerous infor- consumer/product relationships between users and target 
mation servers I^-I^ located around the world, each of objects. A system that serves a large group of users must 
which provides access to a set of information of differing 40 maintain interactive performance and the disclosed method 
format, content and topics and via a cataloging system that for profiling and clustering target objects and users can in 
is typically unique to the particular information server turn be used for optimizing the distribution of data among 

The information is comprised of individual "files," which the members of a virtual community and through a data 
can contain audio data, video data, graphics data, text data, communications network, based on users* target profile 
structured database data and combinations thereof. In the 45 interest summaries. 

terminology of this patent, each target object is associated ^Network Elements and System Characteristics 
with a unique file: for target objects that are informational in The varin^ i?^ processors interconnected bv thcd ata com- 
naturc and can be digitally represented, the file directly muni^twnMtwork.N_as shown in FIG. 1 cin'^Taivided 
stores the informational content of the target objea, while into two classes and grouped as illustrated in FIG. 2: clients 
for target objects that are not stored electronically, such as so and servers. The clie nts Cl-C n-areondivid ual use r's com- 
purchasable gpods, the file contains an identifying descrip- put er sys tems which are connected to servers Sl^5 at 
tion of the target object Target objects stored electronically vario us tim^~via data comrniSmcalionOn lgrEac^^ the 
as text files can include commercially provided news clients Ci is typically associated with a single server Sj, but 
articles, published documents, letters, user-generated these associations can change over time. The clients Cl-Cn 
documents, descriptions of physical objects, or combioa- 55 bothJnterfeoe.with users and produce and re triev^ files to 
tions of these classes of data. The organization of the files and from servers. The clients Cl-Cn are not necessarily 
containing the information and the native format of the data continuously on-line, since they typically.^ ^e a single user 
contained in files of the same conceptual type may vary by and can be movable_§ystcms, such as l aptop computer s, 
information server Ii-I„. which can be connected to the data communications networic 

Thus, a user can have difficulty in locating files that 60 N at any of a number of locations. Clients could also be a 
contain the desired information, because the information variety of other compute rs, such as computers and kiosks 
may be contained in files whose information server catalog- providing access to customized information as well as 
ing may not enable the user to locate them. Furthermore, targeted advertising to many users, where the users identify 
there is no standard catalog that defines the presence and themselves with passwords or with smart cards. A server Si 
services provided by all information servers I^-I^. A user 6S is a compujer system that is presumed to be continuously 
therefore does not have simple access to information but on-line ^and funct ions to both collect files trom various 
must expend a significant amount of time and energy to sources on the data communication network N for access by 
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local clients Cl-Cn and col lect files ficpm local clien ts 
Cl^nj or access bv remote' cli ents. The server Si is 
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equipped with p ersisteat storage , such as a magnetic disk 
data storage medium, and are intercoimected with other 
servers via data communications links. The data communi- 
cations links can be of arbitrary topology and architecture, 
and arc described herein for the purpose of simplicity as 
point-to-point links or, more precisely, as virtual point-to- 
point links. The servers S1^5 comprise the network ven- 
dors Vl-Vk as well as the information servers Ii-1„ of FIG. 
1 and the functions performed by these two classes of 
modules can be merged to a greater or lesser extent in a 
single server Si or distributed over a number of servers in the 
data communication network N. Prior to proceeding with the 
description of the preferred embodiment of the invention, a 
nimibcr of terms are defined. FIG. 3 illustrates in block 
diagram form a representation of an arbitrarily selected 
network topology for a plurality of servers A-D, each of 
which is interoonnected to at least one othe r s^r^r 
typicall y also to a pluralit y of,clientsj>r:S.-Seiy-ers.AJ) are 
interc onnected by_a ^g U fi g tion of point to point data com- 
muiu catioDs lin ks, and server A is connected to client r, 
server B is connected to clients p-q, while server D is 
connected to client s. Servers transmit encrypted or unen- 
crypted messages amongst themselves: a message typically 
contains the textual and/or graphic information stored in a 
particular file, and a lso contains data which describe the type 
and origin of this file, the name of the server that is suppo sed 
to receivejhenifissage, and the purpose for which the file 
contents are being transmitted. Some messages are not 30 
associated with any file, but are sent by one server to other 
servers for control reasons, for example to request transmis- 
sion of a file or to annoimce the availability of a new file. 
Messages can be forwarded by a server to another server, as 
in the case where server A transmits a message to server D 
via a relay node of either server C or servers B, C. It is 
generally preferable to have multiple paths through the 
network, with each path being characterized by its perfor- 
mance capability and cost to enable the network N to 
optimize trafBc routing. In one particular implementation 
which is increasingly used on the World Wide Web, "chan - 
nels" of content qie used to enable users to select topicall y 
r elevant areas of interest ( e. g.. National Geographic , Forbes, 
The Wall Street Journal, USA Today, The Disne y Channel, 
Wired,-CNN) Jhese^c hannels may be cither accessed o n 
demand, downlo aded in advance-to-the user^(as part of a 
"virtuaHjsubscript ionVor selectively retrieved wfic rein the 
user's profi ledictat cs the ite m&sckctcd. In this approach the 
itemslnay be actively prefetched or filtered fi-om a live chat 
stream. Similarly the current methods for the Custom news 
fiher may be used in this application to selectively filter and 
present the most relevant programming selections to the 
user, thus creating a "virtual channer. The basis for this 
concept (using a one way down stream delivery architecture) 
Q^was detailed in paten pending. 
VJ^ In accordance with the techniques presently suggested, 
just as categ OTies of information contain profiles , the most 
appropriate information (e.g., news information) can be 
automatically routed to the most appropriate category. Simi- 
larly content may be automatically routed to the most 
approp riate virtual jghannels which appeal to a particular 
type of audience (not only based on its content, but more 
subjective criteria as well) offering a imique multi media 
experience, writing or commentary style of its authors, etc. 
For this reason it may be most appropriate to initially gather. 
r elevance feedba^LQlwfaicb-us&is.acc^ the information in 
o rder to develop statistic aLoonfidencc-asto itsa^ciative 
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attributes before it is routed to a particular channel. For 
example, in this regard as with the presently described 
tectmiques for customizing content through indexing, navi- 
gation and delivery from the entire scope of available 
information on the Internet, the scope of information may be 
narrowed to that of a particular channel. Additionally, 
because considerable overlap of content may occur between 
channels, authors and editors of a particular channel may use 
this technique to select the most desirable content from 
which appropriate editing and revisions may be performed 
as desired. These channels ideally are presented in combi- 
nation with virtual communities (e.g., virtual.text^and_yoice 
chaLxooms). They may accordingly be navigated to/from as 
part of the 3-D representation of the surroimding informa- 
tion space. For example virtual chat room associated with a 
news channel may incorporate scheduled live interviews 
with news reporters (or news makers) who had covered (or 
had been involved in) a particular story or combination of 
stories during which time participants may submit questions 
or comments (pseudonymously if desired). Polls may be 
taken about these users views on each particular event or 
controversial issues that are newsworthy. As suggested, 
preference based attributes, demographics and psychologi- 
cal user attributes may be statistically correlated with certain 
news from survey question responses or as otherwise sub- 
mitted (such as in the form of active comments about that 
particular issue). Because questions and comments irom 
many users may bombard a particular chat room, automated 
methods may be used to more efficiently. manage large 
quantities of data. Specifically, the system may apply the 
following techniques: 
jCp 1. Real time automatic identification of similar queries or 
co mment ^wfaidi had been previously submitted (using 
statistical NLP or deeper NLU techniques). Once a user 
has submitted a question or comment, the system 
instantaneously indexes any similar item(s) previously 
submitted, automatically notifies the user that the user's 
submission has been canceled and automatically 
retrieves the previously submitted response to that 
previously submitted item. In the context of an ascribed 
posting to news groups currently known techniques 
such as auto-FAQ are able to generate FAQs automati- 
cally. For either live chat or (asynchronous) 
newsgroups, this technique may instead be used to 
eliminate redundancy by identifying (by indexing in 
real time via statistical NLP) pre-existing similar cor- 
respondences to those which are about to be initiated. 
Automatic ally determioe thco redictcd vahico fa user!s 
onmmenL^i ftnd responf^srTRi s~mav'be~de{^mined a s 
the_ £roduct of number andJeng th of cQmmep te <aih- 
nu tted in response to that user's posting s, as well as the 
e stimated predicted value of the resp Qnse-based.upon 
the estim ated value of that associated parti cular res pon- 
deiit!s- knowled^ within the knowledge doma in of the 
co ntent profile of that response as wetfas the^tj U DOe tha t 
us ers spend rg acjig g the p osting fr om the user's inter est 
profile. Again, the relevance of this factor is also the . 
product of the reader's knowledge within the knowl- 
edge domain of the qpntent profile of the user's mes- 
sage. In the appUcation to a future guest or moderator 
of a b ulletin board orjrhat room (or a variation thereof 
called a 'V irtual talk sh ow" in which the moderator 
fields questions by participants) the most predictively 
"valuable" questions, comments and/or responses are 
selectively prioritized for submission and reading (if a 
response) by the other participants. For the newsgroup 
application, items which are highest priority are pre- 
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seated first, responses to the same which are of highest 
priority are posted. Additionally, an item which is very 
similar though not as "valued" may also receive a lower 
value score which is less valuable though more unlike 
other items. In the application to live chat because the 
associative attribute of the list of readers of the item is 
unavailable (in real time) instead, the real time profiling 
of the message is performed and any predictive value 
estimated based upon that user's determined skill 
(value) within the knowledge domain of his/her mes- 
sage. Additionally, value estimation may be converted 
to actual price values (using the exchange of soft 
currency) as a variation of the price point determination 
scheme. In this regard, d ialog ues, users submitted 
queries, and anticipated responses thereto are appro- 
priately matched, priced (value appraised), a "net bal- 
ance" is automatically determined for each informa- 
* * * ti6nd*"exchahge (or transa^^^^^ each user's 

"account" is debited or credited accordingly. If desired, 
participants external to a particular transaction may 
passively observe the net cost of each transaction, the 
price and, if the user perceives the estimated value to be 
inappropriate, he/she may submit a suggested modifi- 
cation of its value. These recommendations may be 
averaged in order to determine the most appropriate net 
transaction value. Again the relevance niay be adjusted 
to the recommendation in accordance with the skill of 
that user within that knowledge domain for determining 
the actual modified value. This approach may be 
applied also on the context of Intranet (or multi- 
organizational Intranet). 
Several applications to bandwidth content delivery may be 
included, including video on demand wherein video and 
audio programming content may be delivered to the user. 
Techniqu g for customizing _piDg ram ^de selections to 
u sers have been d etailed in the patent pending patent entitled 
"System and Method for Scheduling Broadcast and Access 
to \^eo Program and Other Data" Using Customer Pro- 
files", The present system may readily be applicable to radio 
programming sent over cable (or the Internet). Particularly 
for short programming selections like music, music video 
and short audio or multimedia segments, it is desirable t o 
auto mate the selection process b ^cieatinp: a "vir tual cha n- 
pgjl ^ selections which are retrieved sequen tially. As pre- 
viously described, existmg channels may be accessible to 
iisers on the WWW. These techniques far automated sequen- 
tial of retrieval of content may be another implementation of 
another channel (e.g., using cable as a high bandwidth 
transmission medium to access a video server on the 
WWW). Another application of this architectxu-e could be 
use of a client processor in a video store which receives 
purchases from the u ser's account, is maintaine d on the local 
serverand the similarity measxirements are processedlocally 
or performed by a video server which may deliver high 
bandwidth video, audio, (e.g., music) or muhi media soft- 
ware to a compact disc at the store which is customized to 
the user's preferences. If user purchasing records don't yet 
exist or arc not complete, the rapid profiling_5yatcm-may 
co Qgtruct,the,user!s,profile. This_syslfim-ma3Lbe_i^le- 
mented as a stand alone credit card or smart card enabled 
kiosk which may be equipped with (for example) the cur- 
rently described menu navigation and query techniques, 
Proxy Servers and Pseudonymous Transactions 

while the method of using target profile interest summa- 
ries presents many advantages to both target object provid- 
ers and users, there are important privacy issues for both 
users and providers that must be resolved if the system is to 
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be used freely and without inhibition by users without fear 
of invasion of privacy. It is likely that user s desire that some, 
if not all, of the user-specific information in their user 
profiles and target profile interest summaries remain 
5 confidential, to be disclosed only under certain circum- 
stances related to certain types of transactions and according 
to their personal wishes for differing levels of confidentiality 
regarding their purchases and expressed interests. 

However, complete privacy and inaccessibility of user 
10 transactions and pro^le summary infmrngrinn would hinder 
implementation of the system for customized electronic 
identifi cation of desirable objects and would deprive th eliser 
of many of the advantages derived thmn p^f^ the: sy^tftm'^ use 
of usei^iycific mfiarmAijnn In many cases, complete and 
total privacy is not desired by all parties to a transaction. For 
example, a buyer may desire to be targeted for certain 
mailings that describe products that are relatedJoJiisjMjjer 
int eresj s, and a seller may desire' to target users" who^are * 
predicted to be interested in the goods and services that the 
seller provides. Indeed, the usefulness of the technology " 
described herein is contingent upon the abilityiiLth£.s^em "1 
to collect.a nd compare data about many users and many 
t arget objects. A compromise txtween total user anonymity - 
and total public disclosure of the user's search profiles or 
target profile interest summary is a pseudonym. A pseud- 
onym is an artifact that allows a service provider to com - 
mu nicate with users and build and accumulate records of. 
th efpreferences over tim e, wmie at the same time remain- 
ing ignorant of the users' true identities, so that users can 
keep their ,piUi:hase&-oii-^«fereBGe4,Eriyate. A second and 
equally important requirement of a pseudonym system is 
that it provide for di gital c redentials, which are used to 
guarantee that the user represented by a particular pseud- 
onym has certain properties. These credentials may be 
granted on the basis of result of activities and transactions 
conducted by means of the system for customized electronic 
identification of desirable objects, or on the basis of other 
activities and transactions conducted on the networic N of 
the present system, on the basis of users' activities outside 
of network N. For example, a service provider may require 
proof that the purchaser has sufiBcient funds on deposit at 
his/her bank, which might possibly not be on a network, 
before agreeing to transact business with that user. The user, 
therefore, must provide the service provider with proof of 
funds (a credential) from the bank, while still not disclosing 

?ie user's true identity to the service provider. 
, rOur method solves the above problems by combining the 
pseudonym granting and credential transfer methods taught 
by D. Chaum and J. H. Evertse, in the paper titled "A secure 
and privacy-protecting protocol for transmitting personal 
information between organizations," with the implementa- 
tion of a set of one or more proxy servers distributed 
throughout the network N. Each proxy server, for example 
S2 in FIG. 2, is a server which communicates with clients 
and other servers S5 in the network either directly or through 
anonymizing mix paths as detailed in the paper by D. Chaum 
titled "Untraceable Electronic Mail, Return Addresses, and 
Digital Pseudonyms," published in Communications of the 
ACM, \felume 24, Number 2, Febmary 1981. Any server in 
the network N may be configured to act as a proxy server in 
addition to its other functions. Each proxy server provides 
service to a set of users, which set is termed the "user base" 
of that proxy server. A given proxy server provides three 
sorts of service to each user U in its user base, as follows: 
1. The first finction of the proxy server is to bidirectionally 
transfer communications between user U and other 
entities such as information servers (possibly including 
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the proxy server itself) and/or other users. Specifically, 
letting S denote the server that is directly associated 
with user U's client processor, the proxy server com- 
municates with server S (and thence with user U), 
either through anonymizing mix paths that obscure the 
identity of server S and user U, in \k^ich case the proxy 
server knows user U only through a secure pseudonym, 
or else through a conventional virtual point-to-point 
connection, in which case the proxy server knows user 
U by user U's address at server S, which address may 
be regarded as a non-secure pseudonym for user U. 

2. A second function of the proxy server is to record 
user-specific information associated with user U. This 
user-specific information includes a user profile and 
target profile interest summary for user U, as well as a 
list of access control instructions specified by user U, as 
described below, and a set of one-time return addresses 
provided by user U that can* be' used to'send messages 
to user U without knowing user U's true identity. All of 
this user-specific information is stored in a database 
that is keyed by user U's pseudonym (whether secure 
or non-secure) on the proxy server 

3. A third function of the proxy server is to act as a 
selective forwarding agent for unsolicited communica- 
tions that are addressed to user U: the proxy server 
forwards some such communications to user U and 
rejects others in accordance with the access control 
instructions specified by user U. 

^Vft^ Our combined method allows a given user to use- either a ^ 
• single pseudonym in all transactions where he or she wishes 
to remain pseudonymous, or else different pseudonyms for 
diff erent types of transactions. In the latter case, each service 
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pro vider aight ^transact with the us er under a different 
ps eudonym for the user . More generally, a cnalirinn of 
serv ice providers, a ll of whom match users with the s ame 
ge nre of target objects, iniRht agree to transact with tfie u ser 
u sing a common pseudon ym, so that the target profile 
intexestsum mary associated with tbatpseudonv m would be 
complete with respect to said genre of target objects. When 
a user employs several pseudonyms in order to transact with 
different coalitions of service providers, the user may freely 
choose a proxy server to service each pseudonym; these 
proxy servers may be the same or different. 

From the service provider's perspective, our system pro- 
vides security, in that it can guarantee that users of a service 
are legitimately entitled to the services used and that no user 
is using multiple pseudonyms to communicate with the same 
provider. This uniqueness of pseudonyms is important for 
the purposes of this application, since the transaction infor- 
mation gathered for a given iiKlividual must represent a 
complete and consistent picture of a single user's activities 
with re^ct to a given service provider or coalition of 
service providers; otherwise, a user's target profile interest 
summary and user profile would not be able to represent the 
user's interests to other parties as completely and accurately 
as possible. 

The service provider must have a means of protection 
from users who violate previously agreed upon terms of 
service. For example, if a user that uses a given pseudonym 
engages in activities that violate the terms of service, then 
the service provider should be able to take action against the 
user, such as denying the user service and blacklisting the 
user from transactions with other parties that the user might 
be tempted to defraud. This type of situation might occur 
when a user employs a service provider for illegal activities 
or defaults in pa yments to the scry ioejirovider. The method 
of the paper titled "Security without identification: Trans- 



action systems to make Big-Brother obsolete", published in 
the Communications of the ACM, 28(10), October 1985; 
pp. 1030-1044, incorporated herein, provides for a mecha- 
nism to enforce protection against this type of behavior 
5 through the use of resolution credentials, which are creden- 
tials that are periodically provided to individuals contingent 
upon their behaving consistent with the agreed upon terms 
of service between the user and information provider and 
network vendor entities (such as regular payment for ser- 
10 vices rendered, civil conduct, etc.). For the user's safety, if 
the issuer of a resolution credential refuses to grant this 
resolution credential to the user, then the refusal may be 
appealed to an adjudicating third party. The integrity of the 
user profiles and target profile interest summaries stored on 
15 ' proxy servers is important: if a seller relies on such user- 
specific information to deliver promotional offers or other 
material to a particular class of users, but not to other users, 
then the user-specific information must be acciirate and' 
untampered with in any way. The user may likewise wish to 
20 ensure that other parties not tamper with the user's user 
profile and target profile interest summary, since such modi- 
fication could degrade the system's ability to match the user 
with the most appropriate target objects. This is dorje by 
providing for tbe user to apply digital signatures to the 
25 control messages sent by the user to the proxy server. Each 
pseudonym is paired with a public cryptographic key and a 
private cryptographic key, where the private key is known 
only to the user who holds that pseudonym; when the user 
sends a control message to a proxy server under a given 
pseudonym, the proxy server uses the pseudonym's public 
key to verify that the message has been digitally signed by 
someone who knows the pseudonym's private key. This 
prevents other parties fix)m masquerading as the user. 
>^ Our approach, as disclosed in this application, provides an *\ 
(5 improvement over the prior art in privacy-protected pseud- 
onymny for network subscribers such as taught in U.S. Pat. 
No. 5,245,656, which provides for a name translator station 
to act as an intermediary between a service provider and the 
user. However, while U.S. Pat. No. 5,245,656 provides that 
the information transmitted between the end user U and the 
service provider be doubly encrypted, the fact that a rela- 
tionship exists between user U and the service provider is 
known to the name translator, and this fact could be used to 
compromise user U, for example if the service provider 
45 specializes in the provision of content that is not deemed 
acceptable by user U's peers. The method of U.S. Pat. No. 
5,245,656 also omits a method for the convenient updating 
of pseudonymous user profile information, such as is pro- 
vided in this application, and does not provide for assurance 
50 of unique and credentialed registration of pseudonyms from 
a credentialing agent as is also provided in this application, 
and does not provide a means of access control to the user 
based on profile information and conditional access as will 
be subsequently described. The method described by Loeb et 
55 al. also does not describe any provision for credentials^ such 
as might be used for authenticating a user*s ri^ht to a ccess 
particujar targct ^obiects, such as _targct _obiects that arc 



inten ded to be available only upon payment of a subscrip tion 
fee, or tar g et objects that a re-intend ed, to be u navailable to 

60 younger_users. 

Proxy Server Description 

In order that a user may ensure that some or all of the 
information in the user's user profile and target profile 
interest summary remain dissociated firom the user's true 

55 identity, the user employs as an intermediary any one of a 
number of proxy servers available on the data communica- 
tion network N of FIG. 2 (for example, server S2). The 
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proxy servers function to disguise the true identity of the provide for access and reachabUity control under user and 

user from other parties on the data communication network proxy server control. 

N. The proxy server represents a given user to either single Validation and Allocation of a Unique Pseudonym 
network vendors and information servers or coalitions Chaum*s pseudonym and credential issuance system, as 
thereof A proxy server. e.g. S2, is a server computer with 5 described in a publication by D. Chaum and J. H. Evertse, 
CPU, main memory, secondary disk storage and network titled "A secure and privacy-protecting protocol for trans- 
conamunication function and with a database function which mitting personal information between organizations," has 
retrieves the target profile interest summary and access several desirable properties for use as a component in our 
control instructions associated with a particular pseudonym system. The system allows for individuals to use different 
P, which represents a particular user U, and performs lo pseudonyms with different organizations (such as banks and 
bi-directional routing of commands, target objects and bill- coalitions of service providers). The organizations which are 
ing information between the user at a given client (e.g. C3) presented with a pseudonym have no more information 
and other network entities such as network vendors Vl-Vk about the individual than the pseudonym itself and a record 
and information servers U-Im. Each proxy server maintains of previous transactions carried out under that pseudonym, 
an encrypted target profile interest summary associated with IS Additionally, credentials, which represent fects about a 
each allocated pseudonym in its pseudonym database D. The pseudonym that an organization is willing to certify, can be 
actual user-specific information and the associate^^^eud- granted to a particidar pseudonym, and transferred to other 
onya^ needjiQt_b^ ^red locallyn on ^e proxy as ryeir; biit pseiidonyms that the same user employs. For, example.^ the"* 
may alternatively be stored in a dStnbutedi'aghion and be user can use different pseudonyms with different organiza- 
re motely addressable from the proxy serv er via point-to- 20 tions (or disjoint sets of organizations), yet still present 
point connections. credentials that were granted by one organization, under one 
The proxy server supports two types of bi-directional pseudonym, in order to transact with another organization 
connections: point-to-point connections and pseudonymous under another pseudonym, without revealing that the two 
connections through mix paths, as taught by D. Chaum in the pseudonyms correspond to the same user. Credentials may 
paper titled "Untraceable Electronic Mail, Return 25 be granted to provide assurances regarding the pseudonym 
Addresses, and Digital Pseudonyms'*, Communications of bearer's age, financial status, legal status, and the like. For 
the ACM, Volume 24, Number 2, February 1981. The example, credentials signifying "legal adult" may be issued 
normal connections between the proxy server and informa- to a pseudonym based on information known about the 
tion servers, for example a connection between proxy server corresponding user by the given is suing organization. Then, 
S2 and information server S4 in FIG. 2, are accomplished 30 when the credential is transferred to another pseudonym that 
through the point-to-point connection protocols provided by represents the user to another disjoint organization, presen- 
network N as described in the "Electronic Media System tation of this credential on the other pseudonym can be taken 
Architecnire" section of this application. The normal type of as proof of legal adulthood, which might satisfy a condition 
point-to-point connections may be used between S2-S4, for of terms of service. Credential-issuing organizations may 
example, since the dissociation of the user and the pseud- 35 also certify particular facts about a user's demographic 
onym need only occur between the client C3 and the proxy profile or target profile interest summary, for example by 
server S2, where the pseudonym used by the user is avail- granting a credential that asserts "the bearer of this pseud- 
able. Knowit^ that an information provider such as S4 onym is either well-read or is middle-aged and works for a 
communicates with a given pseudonym P on proxy server S2 large company"; by presenting this credential to another 
does not compromise the true identity of user U. The 40 entity, the user can prove eligibility for (say) a discoxmt 
bidirectional connection between the user and the proxy without revealing the user 's personal data to that entity, 
server S2 can also be a normal point-to-point connection, but Additionally, the method taught by Chaum provides for 
it may instead be made anonymous and secure, if the user assurances that no individual may correspond with a given 
desires, though the consistent use of an anonymizing mix organization or coalition of organizations using more than 
protocol as taught by D. Chaum in the paper titled "Untrace- 45 one pseudonym; that credentials may not be feasibly forged 
able Electronic Mail, Return Addresses, and Digital by the user; and t hat credentials may not be transferred from 
Pseudonyms", Communications of the ACM, Volume 24, one user's pseudonym to a different user's pseudonym. 
Number 2, February 1981. This mix procedure provides Finally, the mcdiod provides for expiration of credentials 
untraceable secure anonymous mail between to parties with and for the issuance of "blade marks" against Individuals 
blind return addresses through a set of forwarding and return 50 who do not act according to the terms of service that they are 
routing servers termed "mixes". The mix routing protocol, extended. This is done through the resohition credential 
as taught in the Chaum paper, is used with the proxy server mechanism as described in Chaum's work, in which reso- 
S2 to provide a registry of persistent secure pseudonyms that hitions are issued periodically by organizations to pseud- 
can be employed by users other than user U, by information onyms that are in good standing. If a user is not issued this 
providers U-Im, by vendors Vl-Vk and by other proxy 55 resolution credential by a particular organization or coalition 
servers to communicate with the users in the proxy server's of organization, then this user cannot have it available to be 
user base on a continuing basis. The securi^ provided by transferred to other pseudonyms which he uses with other 
this mix path protocol is distributed and resistant to traffic organizations. Therefore, the user cannot convince these 
analysis attacks and other known forms of analysis which other organizations that be has acted accordance with terms 
may be used by malicious parties to try and ascertain the true 60 of service in other dealings. If this is the case, then the 
identity of a pseudonym bearer. Breaking the protocol organization can use this lack of resolution credential to 
requires a large number of parties to maliciously collude or infer that the user is not in good standing in his other 
be cryptographically compromised. In addition an extension dealings. In one approach organizations (or other users) may 
to the method is taught where the user can include a renim issue a list of quality related credentials based upon the 
path definition in the messagie so the information server S4 6S experience of transaction (or interaction) with the user 
can return the requested information to the user's client which may act similariy to a letter of recommendation as in 
processor C3. We utilize this feature in a novel fashion to a resume. If such a credential is issued from multiple 
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organizations, their values become averaged. In an altema- Comm. ACM 21, February 2, 120-126. Once a user applies 
live variation organizations may be issued credentials from to server Z for a pseudonym P and is granted a signed 
users such as customers which may be used to indicate to pseudonym signed with the private key SK^ of server Z, the 
other future users quality of service which can be expected following protocol takes place to establish an entry for the 
by subsequent users on the basis of various criteria. In one 5 user U in the proxy server S2's database D. 1 . The user now 
approach, the system automatically generated the primary sends proxy server S2 the pseudonym, which has been 
attributes contained in the profile of the user or organization. signed by Z to indicate the authenticity and uniqueness of 
Each attribute is then appropriately rated in order to become the pseudonym. The user also generates a PK^, SIC, key pair 
a list of quality related credentials. for use with the granted pseudonym, where is the private key 

In our implementation, a pseudonym is a data record to associated with the pseudonym and PI^ is the public key 
consisting of two fields. The first field specifies the address associated with the pseudonym. The user forms a request to 
of the proxy server at which the pseudonym is registered. establish pseudonym P on proxy server S2, by sending the 
The second field contains a unique string of bits (e.g., a signed pseudonym S(P, SK;.) to the proxy server S2 along 
random binary number) that is associated with a particular with a request to create a new database entry, indexed by P, 
user; credentials take the form of public-key digital signa- is and the pubHc key PKp. It envelopes the message and 
tures computed on this number, and the number itself is transmits it to a proxy server S2 through an anonymizing 
issued by a pseudonym administering server Z, as depicted mix path, along with an anonymous return envelope header, 
in FIG. 2, and detailed in a generic form in the paper by D. 2. The proxy* server S2 recieives'the database creation eoUy 

Chaum and J. H. Evertse, titled "A secure and privacy- request and associated certified pseudonym message. The 
protecting protocol for transmitting personal information 20 proxy server S2 checks to ensure that the requested pseud- 

between organizations.". It is possible to send information to onym P is signed by server Z and if so grants the request and 

the user holding a given pseudonym, by enveloping the creates a database entry for the pseudonym, as well as 

information in a control message that specifies the pseud- storing the user's public key PI^ to ensure that only the user 

onym and is addressed to the proxy server that is named in U can make requests in the future using pseudonym P. 3. The 

the first field of the pseudonym; the proxy server may 25 structure of the user's database entry consists of a user 

forward the information to the user upon receipt of the profile as detailed herein, a target profile interest summary as 

control message. detailed herein, and a Boolean combination of access control 

While the user may use a single pseudonym for all criteria as detailed below, along with the associated public 
transactions, in the more general case a user has a set of key for the pseudonym P. 4, At any time after database entry 
several pseudonyms, each of which represents the user in his 30 for Pseudonym P is established, the user U may provide 
or her interactions with a single provider or coalition of proxy server S2 with credentials on that pseudonym, pro- 
service providers. Each pseudonym in the pseudonym set is vided by third parties, which credentials make certain asser- 
designated for transactions with a different coalition of tions about that pseudonym. The proxy server may verify 
related service providers, and the pseudonyms used with one those credentials and make appropriate modifications to the 
provider or coalition of providers cannot be linked to the 35 user's profile as required by these credentials, such as 
pseudonyms used with other disjoint coalitions of providers. recording the user's new demographic status as an adult. It 
All of the user's transactions with a given coalition can be may also store those credentials, so that it can present them 
linked by virtue of the fact that they are conducted under the to service providers on the user's behalf 
same pseudonym, and therefore can be combined to define The above steps may be repeated, with either the same or 
a unified picture, in the form of a user profile and a target 40 a different proxy server, each time user U requires a new 
profile interest summary, of the user's interests vis-a-vis the pseudonym for use with a new and disjoint coalition of 
service or services provided by said coalition. There are providers. In practice there is an extremely small probability 
other circumstances for which the use of a pseudonym may that a given pseudonym may have already been allocated by 
be usefiil and the present description is in no way intended due to the random nature of the pseudonym generation 
to limit the scope of the claimed invention for example, the 4S process carried out by Z, If this highly unlikely event occurs, 
previously described rapid profiling tree could be used to then the proxy server S2 may reply to the user with a signed 
pscudonymously acquire information about the user which message indicating that the generated pseudonym has 
is considered by the user to be sensitive such as that already been allocated, and asking for a new pseudonym to 
information which is of interest to such entitites as insurance be generated. 

companies, medical ^ecialists, family counselors or dating so Pseudonymous Control of an Information Server 
services. Once a proxy server S2 has authenticated and registered 
Detailed Protocol a user's pseudonym, the user may begin to use the services 
In our system, the organizations that the user U interacts of the proxy server S2, in interacting with other network 
with are the servers Sl-Sn on the network N. However, entities such as service providers, as exemplified by server 
rather than directly corresponding with each server, the user ss S4 in FIG. 2, an information service provider node con- 
employs a proxy server, e.g. S2, as an intermediary, between nected to the network. The user conUoIs the proxy server S2 
the local server of the user's own client and the information by forming digitally encoded requests that the user subse- 
provider or network vendor. Mix paths as described by D. quently transmits to the proxy server S2 over the network N. 
Chaum in the paper titled "Untraceable Electronic Mail, The nature and format of these requests will vary, since the 
Return Addresses, and Digital Pseudonyms", Communica- 60 proxy server may be used for any of the services described 
tions of the ACM, Vblumt 24, Number 2, February 1981 in this application, such as the browsing, querying, and other 
allow for untraceability and security between the client, such navigational functions described below, 
as C3, and the proxy server, e.g. S2. Let S(M,K) represent In a generic scenario, the user wishes to communicate ' 
the digital signing of message M by modular exponentiation under pseudonym P with a particular information provider or 
with key K as detailed in a paper by Rivest, R. L., Shamir, 65 user at address A, where P is a pseudonym allocated to the 
A., and Adleman, L Titled "A method for obtaining digital user and A is either a public network address at a server such 
signatures and public-key cryptosystems", published in the as S4, or another pseudonym that is registered on a proxy 
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server such as S4. (I a the most common version of this 
sceaario, address A is the address of an infonnatioo provider, 
and the user is requesting that the in formation provider send 
target objects of interest.) The user must form a request R to 
proxy server S2, that requests proxy server S2 to send a 5 
message to address A and to forward the response back to the 
user. The user may thereby communicate with other parties, 
either non-pseudonymous parties, in the case where address 
A is a public network address, or pseudonymous parties, in 
the case where address A is a pseudonym held by, for 
example, a business or another user who prefers to operate 
pseudonymously. 

In other scenarios, the request R to proxy server S2 
formed by the user may have different content. For example, 
request R may instruct proxy server S2 to use the methods 
described later in this description to retrieve from the most 
convenient server a particular piece of information that has 
been multicast to many servers, and to send this information 
to the user. Conversely,' request R may instruct proxy server 
S2 to multicast to many servers a file associated with a new 
target object provided by the user, as described below. If the 20 
user is a subscriber to the news clipping service described 
below, request R may instruct proxy server S2 to forward to 
the user all target objects that the news clipping service has 
sent to proxy server S2 for the user's attention. If the user is 
employing the active navigation service described below, 25 
request R may instruct proxy server S2 to select a particular 
cluster from the hierarchical cluster tree and provide a menu 
of its subclusters to the user, or to activate a query that 
temporarily affects proxy server S2*s record of the user's 
target profile interest summary. If the user is a member of a 30 
virtual community as described below, request R may 
instruct proxy server S2 to forward to the user all messages 
that have been sent to the virtual community. 

Regardless of the content of request R, the user, at client 
C3, initiates a connection to the user's local server SI, and 35 
instructs server SI to send the request R along a secure mix 
path to the proxy server S2, initiating the following sequence 
of actions: 

1. The user's client processor C3 farms a signed message 
S(R, SKp), which is paired with the user's pseudonym 40 
P and (if the request R requires a response) a secure 
one-time set of return envelopes, to form a message M. 

It protects the message M with an multiply enveloped 
route for the outgoing path. The enveloped route s 
provide for secure communication between SI and the 45 
proxy server S2. The message M is enveloped in the 
most deeply nested message and is therefore difGcult to 
recover siould the message be intercepted by an eaves- 
dropper 

2. The message M is sent by client C3 to its local server 50 
SI, and is then routed by the data communication 
network N from server SI through a set of mixes as 
dictated by the outgoing envelope set and arrives at the 
selected proxy server S2. 

3. The proxy server S2 separates the received message M 55 
into the request message R, the pseudonym P, and (if 
included) the set of envelopes for the return path. The 
proxy server S2 uses pseudonym P to index and retrieve 
the corresponding record in proxy server S2's database, 
which record is stored in local storage at the proxy 60 
server S2 or on other distributed storage media acces- 
sible to proxy server S2 via the network N. This record 
contains a pubUc key PK^, user-specific information, 
and credentials associated with pseudonym P. The 
proxy server S2 uses the public key PKp to check that 65 
the signed version S(R, SI^) of request message R is 
valid. 
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4. Provided that the signature on request message R is 
valid, the proxy server S2 acts on the request R. For 
example, in the generic scenario described above, 
request message R includes an embedded message Ml 
and an address A to whom message Ml should be sent; 
in this case, proxy server S2 sends message Ml to the 
server named in address A, such as server S4. The 
conununication is done using signed and optionally 
encrypted messages over the normal point to point 
connections provided by the data communication net- 
work N. When necessary in order to act on embedded 
message Ml, server S4 may exchange or be caused to 
exchange further signed and optionally encrypted mes- 
sages with proxy server S2, still over normal point to 
point connections, in order to negotiate the release of 
user-specific information and credentials from proxy 
server S2. In particular, server S4 may require server S2 
to supply credentials proving that the user is entitled to~- 
the information requested — for example, proving that 
the user is a subscriber in good standing to a particular 
information service, that the user is old enough to 
legally receive adult material, and that the user has been 
offered a particular discount (by means of a special 
discount credential issued to the user's pseudonym). 
Such a special discount credential may be automati- 
cally provided by a trusted process residing in the 
proxy server i.e. the price point algorithm. In one 
approach, this special discount credential may persist 
so long as the trusted process on the proxy server 
allows it to (that provides access to an appropriate 
discount by that user, this may be termed "digital 
coupon"). In another variation, the terms of the special 
discoimt credential may vary in accordance with certain 
user actions (which are pre-specificd to the user) e.g. 
automatically modifying the degree or nature of the 
discount in response to user pxirchasing behavior 
towards that vendor or product (or jointly marketed, 
products or a vendor consortium). This may be termed 
a "digital shopper's card". 

5. If proxy server S2 has sent a message to a server S4 and 
server S4 has created a response M2 to message Ml to 
be sent to the user, then server S4 transmits the 
response M2 to the proxy server S2 using normal 
network point-o-point cormections. 

6. The proxy server S2, upon receipt of the response M2, 
creates a return message Mr comprising the response . 
M2 embedded in the return envelope set that was 
earlier transmitted to proxy server S2 by the user in the 
original message M. It transmits the return message Mr 
along the pseudonymous . mix path specified by this 
return envelope set, so that the response M2 reaches the 
user at the user's chent processor C3. 

7. The response M2 may contain a request for electronic 
payment to the information server 84. The user may 
then respond by means of a message M3 transmitted by 
the same means as described for message Ml above, 
which message M3 encloses some form of anonymous 
payment. Alternatively, the proxy server may respond 
automatically with sudi a payment, which is debited 
from an account maintained by the proxy server for this 
user 

8. Either the response message M2 from the information 
server S4 to the user, or a subsequent message sent by 
the proxy server S2 to the user, may contain advertising 
material that is related to the user's request and/or is 
targeted to the user. Typically, if the user has just 
retrieved a target object X, then (a) either proxy server 
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S2 or information server S4 determines a weighted set aV Id general, the user requests access to a particular target 
of advertisements that are "associated with" taigctVV object or menu of target objects; once the corresponding file 

object X (b) a subset of this set is chosen randomly, has been transmitted to the user's client processor, the user 

where the weight of an advertisement is proportional to views its contents and makes another such request, and so 

the probabihty that it is included in the subset, and (c) 5 on. Each request may take many seconds to satisfy, due to 

Ei^K^h f r "^J^ v^T'^T ^^<=-^ transmLion dclajs. However, to the extent 

mines the set of advertisements ass<iiated with target idenUficaUon of desirable objects can 

object X, then this set typically consisu of aU adver- "^^^""^ "^""'^ "^'""^'^ retnevmg or 

tisements that the proxy scrver^s owner has been paid ^° ^^"^"^ '^^"^""^ appropriate files even before the user 

to disseminate and whose target profiles are within a requests them. This early retrieval is termed "pre-fetching of 

threshold similarity disUnce of the target profile of ^^^^ ^ As earUer suggested the present system also enables 

target object X. In the variation where proxy server S4 ^ y^^^ automatically ranked hyperiinks in accordance 

determines the set of advertisements associated with ^^^^ relative priority to the user profile. By combining 

target object X advertisers typically purchase the right this approach with prefetching (suggesting to the user for 

to include advertisements in this set. In either case, the fi^^s prefetching has already been initiated) overall predic- 

weight of ari advertisement is determined by the tion of the next user action is further enhanced, 

amount that an advertiser is wiliing'to pay. Following Pre-fetching of locally stored data has" been heavily stiid- ^ 

step (c), proxy server S2 retrieves the selected adver- ied in memory hierarchies, including CPU caches and sec- 

tisii^ material and transmits it to the user's client 20 ondary storage (disks), for several decades. A leader in this 

processor C3, where it wiU be di^layed to the user, area has been A. J. Smith of Berkeley, who identified a 

within a specified length of time after it is received, by variety of schemes and analyzed opportunities using exten- 

ajlrusted process runnmg on the user's client processor sive traces in both databases and CPU caches. His conclu- 

C3. When proxy server S2fransmits an advertisement, sion was that general schemes only really paid off where 

it sends a message to the advertiser, mdicatme that the th^^^r^^ ui u *u * ^1 

advertisement hL been transmitted to a uJr with a ^'Tr^T 

particularprcdicted level of interest. Hie message may ^' ^.^^^.^^^^^^ead of data^As the balances 

also indicate the identity of target object X. In return! n'^^rT?"^! o^' '^^^^^^ To^^"^ ^iJ.'''^ 'u^"^ 

the advertiser may transmit an electfonic payment to ^^^^ ^^i^.^u "^"^ ^^'^^ ^^^^ ^' ^"^'^ "'^^ 

proxy server S2; proxy server S2 retains a service fee r^f^, identified further opportunities for pre-fetching of 

for itself, optionally forwards a service fee to informa- ^^^^ ^^^^^ ^^^'"^ network data. In particular, 

tion server S4, and the balance is forwarded to the user deeper analysis of patterns in work by Blaha showed the 

or used to credit the user's account on the proxy server. possibility of using expert systems for deep pattern analysis 

9. If the refuse M2 contains or identifies a Uiget object, pre-fetching. Work by J. M. Smith 

the passive and/or active relevance feedback that the proposed the use of reference history trees to anticipate 

user provides on this object is tabulated by a process on references in storage hierarchies where there was some 

the user's client processor C3. A summary of such historical data. Recent work by Touch and the Berkeley 

relevance feedback infOTmation, digitally signed by work addressed the case of data on the WsrW-Wide Web, 

client processor C3 with a proprietary private key where the large size of images and the long latencies provide 

SKc3, is periodically transmitted through an a secure extra incentive to pre-fetch; Touch's technique is to pre-send 

mix path to the proxy server S2, whereupon the search 40 when large bandwidths permit some speculation using 

profile generation module 202 resident on server 82 HTML storage references embedded in WEB pages, and the 

updates the appropriate target profile interest summary Berkeley work uses techniques similar to J. M. Smith's 

associated with pseudonym P, provided that the signa- reference histories specialized to the semantics of HTML 

ture on the summary message can be authenticated with data. 

the corre^onding public key PK^a which is available 45 Successful pre-fetching depends on the ability of the 

to all tabulating process that are ensured to have system to predict the next action or actions of the user. In the 

integrity. context of the system for customized electronic idcntifica- 

When a consumer enters into a financial relationship with tion of desirable objects, it is possible to cluster users into 

a particular information server based on both parties agree- groups according to the similarity of their user profiles. Any 

ing to terms for the relationship, a particular pseudonym 50 of the well-known pre-fetching methods that collect and 

may be extended for the consumer with respect to the given utilize aggregate statistics on past user behavior, in order to 

provider as detailed in the previous section. When entering predict future user behavior, may then be implemented in so 

into such a relationship, the consumer and the service as to collect and utilize a separate set of statistics for each 

provider agree to certain terms. However, if the user violates cluster of users In this way, the system generalizes its access 

the terms of this relationship, the service provider may 55 pattern statistics from each user to similar users, without 

decline to provide service to the pseudonym under which it generalizing among users who have substantially different 

transacts with the user. In addition, tiie service provider has interests. The system may further collect and utilize a similar 

the recourse of refusing to provide resolution credentials to set of statistics that describes the aggregate behavior of all 

the pseudonym, and may choose to do so until the pseud- users; in cases where tiie system cannot confidentiy make a 

onym bearer returns to good standing, eo prediction as to what a particular user will do, because the 

Pre-Fetching of Target Objects relevant statistics concerning that user's user cluster are 

In some circumstances, a user may request access in derived from only a small amount of data, the system may 

sequence to many files, which are stored on one or more instead make its predictions based on the aggregate statistics 

information servers. This behavior is common when navi- for all users, which are derived from a larger amount of data. ' 

gating a hypertext system such as the Work! Wide Web, or 55 For the sake of concreteness, we now describe a particular 

when using the target object browsing system described instantiation of a pre-fetching system, tiiat both employs 

^^^O"^ these insights and tiiat makes its pre-fetching decisions 
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through accurate measuremeDt of the expected cost and 
benefit of each potential pre-fetch. 

Pre-fetching exhibits a cost-benefit tradeoflfl Let t denote 
the approximate Dumber of minutes that pre-fetched files are 
retained in local storage (before they are deleted to make 
room for other prc-fetched files). If the system elects to 
prc-fctch a file corresponding to a target object X, then the 
user benefits from a fast re^onse at no extra cost, provided 
that the user expUcitly requests target object X soon there- 
after. Howevei; if the user does not request target object X 
within I minutes of the pre-fetch, then the pre-fetch was 
worthless, and its cost is an added cost that must be borne 
(directly or indirectly) by the user. The first scenario there- 
fore provides benefit at no cost, while the second scenario 
incurs a cost at no benefit. The system tries to favor the first 
scenario by pre-fetching only those files that the user will 
access anyway. Depending on the user's wishes, the system 
may pre-fetcb either coiiservatively, where it controls costs 
by pre-fetching only files that the user is extremely likely to 
request expUcidy (and that are relatively cheap to retrieve), 
or more aggressively, where it also pre-fetches files that the 
user is only moderately likely to request explicidy, thereby 
increasing both the total cost and (to a lesser degree) the total 
benefit to the user. 
!^ In the system described herein, pre-fetching for a user U 25 
is accomplished by the user's proxy server S. Whenever 
proxy server S retrieves a user-requested file F &om an 
information server, it uses the identity of this file F and the 
characteristics of the user, as described below, to identify a 
group of other files Gl . . . Gk that the user is likely to access 
soon. The u ser^s reqii gsLfor fileFjs ^d to "trig ger^ files Gl 
. . . Gk. Proxy server S pre-fetches each of these triggered 
files Gi as foUows: 

1. Unless file Gi is already stored locally (e.g., due to 
previous pre-fistch), proxy server S retrieves file Gi 
from an appropriate information server and stores it 
locally. 

2. Proxy server S timestamps its local copy of file Gi as 
having just been prc-fetched. so that file Gi will be 
retained in local storage for a minimum of approxi- 
mately t minutes before being deleted. 

Whenever user U (or, in principle, any other user registered 
with proxy server S) requests proxy server S to retrieve a file 
that has been pre-fetched and not yet deleted, proxy server 
S can then retrieve the file from local storage rather than 
from another server. In a variation on steps 1-2 above, proxy 
server S pre-fetches a file Gi somewhat differently, so that 
prc-fetched files arc stored on the user's client processor q 
rather than on server S: 

1. If proxy server S has not pre-fetched file Gi in the past 
t minutes, it retrieves file Gi and transmits it to user 
U'sclient processor q. 

2. Upon receipt of the message sent in step 1, client q 
stores a local copy of file Gi if one is not currendy 
stored. 

3. Proxy server S notifies client q that client q should 
timestamp its local copy of file Gi; this notification may 
be combined with the message transmitted in step 1, if 
any. 

4. Upon receipt of the message sent in step 3, client q 
timestamps its local copy of file Gi as having just been 
pre-fetched, so that file Gi will be retained in local 
storage for a minimum of approximately t minutes 
before being deleted. 

During the period that client q retains file Gi in local storage, 
client q can respond to any request for file Gi (by user U or. 
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in principle, any other user of client q) immediately and 
without the assistance of proxy server S. 

The difBcult task is for proxy server S, each time it 
retrieves a file F in response to a request, to identify the files 
Gl . , . Gk that should be triggered by the request for file F 
and pre-fetched immediately. Proxy server S employs a 
cost-benefit analysis, performing eadi prc-fetdi whose ben- 
efit exceeds a user-determined multiple of its cost; the user 
may set the miiltipher low for aggressive prefetching or high 
for conservative prefetching. These pre-fetches may be 
performed in parallel. The benefit of pre-fetching file Gi 
inunediately is defined to be the expected number of seconds 
saved by such a pre-fetch, as compared to a situation where 
Gi is left to be retrieved later (either by a later pre-fetch, or 
by the user's request) if at all. The cost of pre-fetching file 
Gi inunediately is defined to be the expected cost for proxy 
server S to retrieve file Gi, as determined for example by the 
network locations of server S and file Gi and by information 
provider charges, times 1 minus the probability that proxy 
2Q server S will have to retrieve file Gi within t minutes (to 
satisfy either a later pre-fetch or the user's exphcit request) 
if it is not pre-fetched now. 

The above definitions of cost and benefit have some 
attractive properties. For example, if users tend to retrieve 
either file Fl or file F2 (say) after file F, and tend only in the 
former case to subsequently retrieve file Gl, then the system 
will generally not pre-fetch Gl immediately after retrieving 
file F: for, to the extent that the user is likely to retrieve file 
F2, the cost of the pre-fetch is high, and to the extent that the 
user is likely to retrieve file Fl instead, the benefit of the 
pre-fetch is low, since the system can save as much or nearly 
as much time by waiting until the user chooses Fl and 
pre-fetching Gl only then. 

The proxy server S may estimate the necessary costs and 
benefits by adhering to the following discipline: 

1. Proxy server S maintains a set of disjoint clusters of the 
users in its user base, clustered according lo their user 
profiles. 

2. Proxy server S maintains an initially empty set PFT of 
"pre-fetch triples" <C,F,G>, where F and G are files, 
and where C identifies either a cluster of users or the set 
of all users in the user base of proxy server S. Each 
pre-fetch triple in the set PFT is associated with several 
stored values specific to that triple. Pre-fetch triples and 
their associated values arc maintained according to the 
rules in 3 and 4. 

3. Whenever a user U in the user base of proxy server S 
makes a rcquest R2 for a file G, or a request R2 that 
triggers file G, then proxy server S takes the following 
actions: 

a. For C being the user cluster containing user U, and 
then again for C being the set of all users: 

b. For any request RO for a file, say file F, made by user 
U during the t minutes strictly prior to the request 
R2: 

c. If the triple <C J=',G> is not currently a member of the 
set PFT, it is added to the set PFT with a count of 0, 
a trigger-count of 0, a target-count of 0, a total 
benefit of 0, and a timestamp whose value is the 
current date and time. 

d. The count of the triple <C,F,G> is increased by one. 

e. If file G was not triggered or explicitly retrieved by 
any request that user U made strictly in between 
requests RO and R2, then the target-count of the 
triple <C,F,G> is increased by one. 

f. If request R2 was a request for file G, then the total 
benefit of triple <C,F,G> is increased either by the 
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time elapsed between request RO and request R2, or 
by the expected time to retrieve file G, whichever is 
less. 

g. If request R2 was a request for file and G was 
triggered or explicitly retrieved by one or more 
requests that user U made strictly in between 
requests RO and R2, with Rl denoting the earliest 
such request, then the total benefit of triple <C,F.G> 
is decreased either by the time elapsed between 
request Rl and request R2, or by the expected time 
to retrieve file G, whichever is less. 

4. If a user U requests a file F, then the trigger-count is 
incremented by one for each triple currently in the set 
PPT such that the triple has form <CJF,G>, where user 
U is in the set or cluster identified by C, 

5. The "age" of a triple <C,F,G> is defined to be the 
... number of days, elapsed between its timestamp and the 

current date and time. If the age of any triple <C,F,G> 
exceeds a fixed constant number of days, and also 
exceeds a fixed constant multiple of the triplets count, 
then the triple may be deleted firom the set PFT. 
Proxy server S can therefore decide rapidly which files G 
should be triggered by a request for a given file F from 
a given user U, as follows. 

1. Let CO be the user cluster containing user U, and CI be 
the set of all users. 

2. Server S constructs a list L of all triples <CO»F.G> such 
that <CO,F.G> appears in set PFT with a count exceed- 
ing a fixed threshold. 

3. Server S adds to list L all triples <C1,F,G> such that 
<CO,F,G> does not appear on list L and <C1JF,G> 
appears in set PFT with a count exceeding another fixed 
threshold. 

4. For each triple <C,F,G> on list L: 

5. Server S computes the cost of triggering file G to be 
expected cost of retrieving file Gi, times 1 minus the 
quotient of the target-count of <cJf,G> by the trigger- 
count of <C,F,G>. 

6. Server S computes the benefit of triggering file G to be 
the total benefit of <C,F,G> divided by the count of 
<C,F,G>. 

7. Finally, proxy server S uses the computed cost and 
benefit, as described earlier, to decide whether file G 45 
should be triggered. The approach to pre-fetching just 
described has the advantage that all data storage and 
manipulation concerning pre-fetching decisions by 
proxy server S is handled locally at proxy server S. 
However, this "user-based" approach does lead to 50 
duplicated storage and effort across proxy servers, as 
well as incomplete data at each individual proxy server. 
That is, the information indicating what files are fre- 
quently retrieved after file F is scattered in an uncoor- 
dinated way across numerous proxy servers. An 
alternative, "file-based" approach is to store all such 
information with file F itself The difference is as 
follows. In the user-based approach, a pre-fctch triple 
<C,F,G> in server S*s set PFT may mention any file F 
arxl any file G on the network, but is restricted to 
clusters C that are subsets of the user base of server S. 
By contrast, in the file-based approach, a pre-fetch 
triple <C,F,G> in server S's set PFT may mention any 
user cluster C and any file G on the network, but is 
restricted to files F that are stored on server S. (Note 
that in the file-based approach, user clustering is net- 
work wide, and user clusters may include users from 
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different proxy servers.) When a proxy server S2 sends 
a request to server S to retrieve file F for a user U, 
server S2 indicates in this message the user U's user 
cluster CO. as well as the user U's value for the 
user-determined multiplier that is used in cost-benefit 
analysis. Server S can use this information, together 
with all its triples in its set PFT of the form <CO,F,G> 
and <C1,F,G>, where CI is the set of all users every- 
where on the network, to determine (exactly as in the 
user-based approach) which files Gl . . . Gk are 
triggered by the request fiar file R When server S sends 
file F back to proxy server S2, it also sends this list of 
files Gl. . . Gk, so that proxy server S2 can proceed to 
pre-fetch files Gl . . . Gk. 
The file-based approach requires some additional data 
transmission. Recall that imder the user-based approach, 
server S must execute steps 3c-3g above for any ordered 
pair of requests RO and R2 made within t miiiutes of each" 
other by a user who employs server S as a proxy server. 
Under the file-based approach, server S must execute steps 
3c-3g above for any ordered pair of requests RO and R2 
made within t minutes of each other, by any user on the 
network, such that RO requests a file stored on server S. 
Therefore, when a user makes a request R2, the user's proxy 
server must send a notification of request R2 to all servers 
S such that, during the preceding t minutes (where the 
variable t may now depend on server S), the user has made 
a request RO for a file stored on server S. This notification 
need not be sent immediately, and it is generally more 
efiScient for each proxy server to bufifer up such notifications 
and send them periodically in groups to the appropriate 
servers. 

Access And Reachability Control of Users and User-Specific 
Information 

Although users' true identities arc protected by the use of 
secure mix paths, pseudonymity does not guarantee com- 
plete privacy. In particular, advertisers can in principle 
employ user-specific data to barrage users with unwanted 
solicitations. The general solution to this problem is for 
proxy server S2 to act as a representative on behalf of each 
user in its user base, permitting access to the user and the 
user's private data only in accordance with criteria that have 
been set by the user. Proxy server S2 can restrict access in 
two ways: 

1. The proxy server S2 may restrict access by third parties 
to server S2's pseudonymous database of user-specific 
information. When a third party such as an advertiser 
sends a message to server S2 requesting the release of 
user-^dfic information for a pseudonym P, server S2 
re fuses to honor the request unless the message 
includes credentials for the accessor adequate to prove 
that the accessor is entitled to this information. The user 
associated with pseudonym P may at any time send 
signed control messages to proxy server S2, specifying 
the credentials or Boolean combinations of credentials 
that proxy server S2 should thenceforth consider to be 
adequate grounds for releasing a specified subset of the 
information associated with pseudonym P. Proxy server 
S2 stores these access criteria with its database record 
for pseudonym P. For example, a user might wish to 
proxy server S2 to release purchasing information only 
to selected information providers, to charitable organi- 
zations (that is, organizations that can provide a 
government-issued credential that is issued only to 
registered charities), and to market researchers who 
have paid user U for the right to study user U's 
purchasing habits. 
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2. The proxy server S2 may restrict the ability of third 
parties to send electronic messages to the user. When a 
third party such as an advertiser attempts to send 
information (such as a textual message or a request to 
enter into spoken or written real -time communication) 5 
to pseudonym P, by sending a message to proxy server 
S2 requesting proxy sender S2 to forward the informa- 
tion to the user at pseudonym proxy server S2 will 
refuse to honor the request, unless the message includes 
credentials for the accessor adequate to meet the lO 
requirements the user has chosen to impose, as above, 
on third parties who wish to send information to the 
user. If the message does include adequate credentials, 
then proxy server S2 removes a single-use pseudony- 
mous retiuti address envelope from it s database record 15 
for pseudonym P, and uses the envelope to send a 
message containing the specified information along a 
secure mix path to the user of pseudonyin 'P/ If 'the * " 
envelope being used is the only envelope stored for 
pseudonym P, or more generally if the supply of such 20 
envelopes is low, proxy server S2 adds a notation to this 
message before sending it, which notation indicates to 
the user's local server that it should send additional 
envelopes to proxy server S2 for future use. 

In a more general variation, the user may instruct the 25 
proxy server S2 to impose more complex requirements on 
the granting of requests by third parties, not simply Boolean 
combinations of required credentials. The user may impose 
any Boolean combination of simple requirements that may 
include, but are not limited to, the following: 30 

(a.) the accessor (third party) is a particular party 

(b.) the accessor has provided a particular credential 

(c.) satisfying the request would involve disclosure to the 
accessor of a certain fact about the user's user profile 

(d.) satisfying the request would involve disclosure to the 
accessor of the user's target profile interest summary 

(e.) satisfying the request would involve disclosure to the 
accessor of statistical summary data, which data are 
computed from the useres user profile or target profile 40 
interest sunmiary together with the user profiles and 
target profile interest summaries of at least n other xisers 
in the user base of the proxy server 

(f.) the content of the request is to send the user a target 
object, and this target object has a particular attribute 
(such as high reading level, or low vulgarity, or an 
authenticated Parental Guidance rating from the 
MPAA) 

(g.) the content of the request is to send the user a target 
object, and this target object has been digitally signed 
with a particular private key (such as the private key 
used by the National Pharmaceutical Association to 
certify approved documents) 

(h.) the content of the request is to send the user a target 

object, and the target profile has been digitally signed 

by a profile authentication agency, guaranteeing that 

the target profile is a true and accurate profile of the 

target object it claims to describe, with all attributes 

authenticated. ^- 

ou 

(i.) the content of the request is to send the user a target 
object, and the target profile of this target object is 
within a specified distance of a particular search profile 
specified by the user 

(j.) the content of the request is to send the user a target 65 
object, and the proxy server S2,.by using the user's 
stored target profile interest summary, estimates the 
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user's likely interest in the target object to be above a 

specified threshold 
(k.) the accessor indicates its willingness to make a 

particular payment to the user in exchange for the 

fulfillment of the request 
The steps required to create and maintain the user's 
access-control requirements are as follows: 

1. The user composes a Boolean combination of predi- 
cates that apply to requests; the resulting complex 
predicate should be true when applied to a request that 
the user wants proxy server S2 to honor, and false 
otherwise. The complex predicate may be encoded in 
another form, for efficiency. 

2. The complex predicate is signed with SKp, and trans- 
mitted from the user's client processor C3 to the proxy 
server S2 through the mix path enclosed in a packet that 

also contains the user*s pseudonym P. .... .. 

3. The proxy server S2 receives the packet, verifies its 
authenticity using PK^ and stores the access control 
instructions specified m the packet as part of its data- 
base record for pseudonym P. The proxy server S2 
enforces access control as follows: 

1. The third party (accessor) transmits a request to proxy 
server S2 using the normal point-to-point connections 
provided by the network N. The request may be to 
access the target profile interest summaries associated 
with a set of pseudonyms PI ... Pn, or to access the 
user profiles associated with a set of pseudonyms PI . 
. . Pn, or to forward a message to the users associated 
with pseudonyms PI . . . Pn. The accessor may explic- 
itly specify the pseudonyms PI . . . Pn, or may ask that 
PI ... Pn be chosen to be the set of all pseudonyms 
registered with proxy server S2 that meet specified 
conditions. 

2. The proxy server S2 indexes the database record for 
each pseudonym Pi (l<=I<=n), retrieves the access 
requirements provided by the user associated with Pi, 
and determines whether and how the transmitted 
request should be satisfied for Pi. If the requirements 
are satisfied, S2 proceeds with steps 3a-3c. 

3a. If the request can be satisfied but only upon payment 
of a fee, the proxy server S2 transmits a payment 
request to the accessor, and waits for the accessor to 
send the payment to the proxy server S2. Proxy server 
S2 retains a service fee and forward s the balance of the 
payment to the user associated with pseudonym Pi, via 
an anonymous return packet that this user has provided. 

3b. If the request can be satisfied but only upon provision 
of a credential, the proxy server S2 transmits a creden- 
tial request to the accessor, and waits for the accessor 
to send the credential to the proxy server S2. 

3c. The proxy server S2 satisfies the request by disclosing 
user-specific information to the accessor, by providing 
the accessor with a set of single-use envelopes to 
communicate directly with the user, or by forwarding a 
message to the user, as requested. 

4. Proxy server S2 optionally sends a message to the 
accessor, indicating why each of the denied requests for 
PI . . . Pn was denied, and/or indicating how many 
requests were satisfied. 

5. The active and/or passive relevance feedback provided 
by any user U with respect to any target object sent by 
any path from the accessor is tabulated by the above- 
described tabulating process resident on user U*s client 
processor C3. As described above, a summary of such 
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information is periodically transmitted to the proxy Constructing a Multicast Tree 

server S2 to enable the proxy server S2 to update that Algorithms for constructing multicast trees have either 

user's taiget profile interest summary and user profile. been ad-hoc. as is the case of the Deering, ct aL Internet 

The access control criteria can be applied to solicited as multicast tree, which adds chents as they request service by 

well as unsolicited transmissions. That is, the proxy server 5 grafting them into the existing tree, or by construction of a 

can be used to protect the user fi-om inappropriate or minimum cost spanning tree. A distributed algorithm for 

misrepresented target objects that the user may request. If creating a spanning tree (defined as a tree that connects, or 

the user requests a target object from an information server, "spans," all nodes of the graph) on a set of Ethernet bridges 

biit the target object turns out not to meet the access control was developed by Radia Perlman ("Interconnections: 

criteria, then the proxy server will not permit the information Bridges and Routers," Radia Perlman, Addison-Wesley, 

server to transmit the target object to the user, or to charge 1992). Creating a minimal-cost spanning tree for a graph 

the user for such transmission. For example, to guard against depends on having a cost model for the arcs of the graph 

target objects whose profiles have been tampered with, the (corresponding to conmiunications links in the communica- 

user may specify an access control criterion that requires the tions network). In the case of Ethernet bridges, the default 

provider to prove the target profile's accuracy by means of cost (more complicated costing models for path costs arc 

a digital signature from a profile authentication agency. As discussed on pp. 72-73 of Perlman) is calculated as a simple 

another example, the parents of a child user may instruct the distance measure to the root; thus the spanning tree mini- 

proxy server that only tar^ that have been digitaUy mi2es the cost to the root by first electing a unique root and 

signed by a recb^iizeci diiki protection organization may be then constnicting' a spaiming 'tree' biased ' on the distances 

transmitted to the usen thus, the proxy server will not let the from the root. In this algorithm, the root is elected by 
user retrieve pornography, even from a rogue infonnation 20 recourse to a mmieric ID contained in "configuration mes- 

server that is willing to provide pornography to users who sages": the server w hose ID has minim um numeric vahie is 

have not supplied an adulthood credential. chosen as the root. Several problems exist with this algo- 

Distribution of Information with Multicast Trees rithm in general. First, the method of using an ID does not 

The graphical representation of tiie network N presented necessarily select die best root for the nodes interconnected 
in FIG. 3 shows that at least one of the data communications 25 in the dree. Second, the cost model is simplistic, 

links can be eliminated, as shown in FIG. 4, while still We first show how to use the similarity-based methods 

enabling the network N to transmit messages among all the described above to select the servers most interested in a 

servers A-D. By elimination, we mean that the link is group of target objects, herein termed "core servers" for that 

unused in the logical design of the network, rather than a group. Next we show how to construct an unrooted multicast 
physical disconnection of the Hnk. The graphs that result 30 tree that can be used to broadcast files to these core servers, 

when all redundant data communications links are elimi- Finally, we show how files corresponding to target objects 

nated are termed "trees" or "connected acyclic graphs." A are actually broadcast through the multicast tree at the 

graph where a message could be transmitted by a server initiative of a client, and how these files are later retrieved 

through other servers and then renim to the transmitting from the core servers when clients request them, 

server over a different originating data communications link 35 Since the choice of core servers to distribute a file to 

is termed a "cycle." A tree is thus an acydic graph whose depends on the set of users who arc likely to retrieve the file 

edges (links) connect a set of graph "nodes" (servers). The (that is. die set of users who are likely to be interested in the 

tree can be used to efficiently broadcast any data file to corresponding target object), a separate set of core servers 

selected servers in a set of interconnected servers. and hence a separate multicast tree may be used for each 

The tree structure is attractive in a communications net- 40 topical group of target objects. Throughout the description 
work because much infonnation distribution is multicast in below, servers may communicate among themselves 
nature — that is, a piece of information available at a single through any path over which messages can travel; the goal 
source must be distributed to a multiplicity of points where of each multicast tree is to optimize the multicast distribu- 
the information can be accessed. This technique is widely tion of files corresponding to target objects of the corre- 
known: for example, "FAX trees" are in common use in 45 sponding topic. Note that this problem is completely distinct 
political organizations, and multicast trees are widely iised from selecting a multiplicity of spanning trees for the 
in distribution of multimedia data in the Internet; for complete set of interconnected nodes as disclosed by Sin- 
example, see "Scalcable Feedback Control for Multicast coskie in U.S. Pat. No. 4,706,080 and the publication tided 
\^deo Distribution in the Internet," (Jean-Chrysostome "Extended Bridge Algorithms for Large Networks" by W. D. 
Bolot, Thierry Turletti, & Ian Wakeman, Computer Com- 50 Sincoskie and C.J. Cotton, published January 1988 in IEEE 
municatioD Review, Vol. 24, #4, October, '94, Proceedings Network on pages 16-24. The trees in this disclosure are 
of SIGCOMM'94, pp. 58-67) or "An Architecture For intentionally designed to interconnect a selected subset of 
MTide-Area Multicast Routing," (Stephen Deering, Deborah nodes in the system, and are successful to the degree that this 
Estrin, Dino Farinacci, Van Jacobson, Ching-Gung Liu, & subset is relatively small. 
Liming Wei, Computer Conmiunication Review, Vbl. 24, #4, 55 Multicast Tree Construction Procedure 
October, '94, Proceedings of SIGCOMM'94, pp. 126-135). A set of topical multicast trees for a set of homogenous 
While there are many possible trees that can be overiaid on target objects may be constructed or reconstructed at any 
a graph representation of a network, both the nature of the time, as follows. The set of target objects is grouped into a 
networks (e.g., the cost of transmitting data over a link) and fixed number of topical clusters CI . . . Cp with the methods 
their use (for example, certain nodes may exhibit more 60 described above, for example, by choosing CI ... Cp to be 
frequent intercommunication) can make one choice of tree the result of a k-means clustering of Uie set of target objects, 
better than another for use as a multicast tree. One of the or alternatively a covering set of low-level clusters from a 
most difficult problems in practical network design is the hierarchical cluster tree of these target objects. A multicast 
construction of "good" multicast trees, that is, tree choices tree MT(c) is then constructed from each cluster C in CI . . . 
which exhibit low cost (due to data not traversing links 65 Cp, by the following procedure: 

unnecessarily) and good performance (due to data frequenUy 1. Given a set of proxy servers. Si . . . Sn, and a topical 

being close to where it is needed) cluster C It is assumed that a general multicast tree MT^„ 
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that contains all the proxy seivers SI ... So has previously (a) Proxy seiverSi randomly selects a target object Tfrom 

been constructed by well-known methods. cluster C. 

2. Each pair <Si, C> is associated with a weight, w(Si, C), C') P^Jty server Si applies the techniques disclosed above 

which is intended to covary with the expected number of aggregate target profile interest summary in 

users in the user base of proxy server Si who will subse- ' '° estimate the aggregate interest w(Si, T) that its 

quenUy access a target object from cluster C. This weight is aggregated user base had in the selected target object T, 

computed by proxy server Si in any of several ways, all of 1^.''? "^'i' ^ interpreted as an estimate of the 

which make use of the similarity measurement computation hkehhood that at least one member of the user base will 

described herein retrieve a new target object similar to T. 

One variation makes use of the following steps: (a) Proxy " f™7 ^' i**"? 'T^^} "Vf^^i"^'' 

server Si randomly selects a target object T from ctoer C '^^^^ ^ """T 

(b) For each pseudonym in iislocal database, with associ- T T'^^ of w(S.t:, thereby computed m 

ated user U, proxy server Si applies the techniques disclosed ^''> ,?,«'enn.ne the desired quantity w(Si, C). 

above to user U's stored u^r profile and target profile which quantity represents the expected aggre^temter- 

interest summary in order to estimate the interest w(U, T) ^ f TJ T ^"''^ 

that user U has in the selected target object T. Hie aggregate ^ccts oi cluster u 

• interest w(Si, T) that the-user ba^ of proxy server S h^ in yq r^""" T-°^-fl ■ ■^^"■'^^K'^t 

the target object T is defined to be the sum of these interest g«>a'est wghts w(Si. Q are designated "core servers" for 

values w(U.T). Alternatively. w(Si.T) may be defined to be ,„ f^T ' ? " '•^sired to select a 

the sum of values s(w(U, T)) over all U in the user base. "^■"^^ °i ^"^"^ ."""^ servers S. with the 

Here sC) is a sigmoMal function that is close to 0 for small ^ 'T ^'.t ^ '^^'^'"t- 

aiguments and close to a constant for large aigumeots; ^'4''.? ^^f^' ^ is compared agamst 

thL s(w(U. Tl) estimates the probTiUty thS ^ wid if"* '^^"^ V 

access target object T. which probabiUty is assumed to be „ ? T*"^ '^^"f* '^^'^ ^'Y^'^ " 

independentof theprobabilitytihatanyoteruserwillaccess " ^^'^P^'f " speaahzed set of target 

target object T. In a variation, w(Si, T) is made to estimate "f*^" ^^.^ CI Cp are 

the probabiUty that at least one user from the user base of Si "^^V ^'^^'^'^ '"select only a small 

wiU access target object T: then w(Si, T) may be defined as °T^' "^""^ ^'^'^ d"ster C. thereby obtammg substan- 

the maximum of values w(U, T), or of 1 minus the product , ^, computaUonal efiBciency m steps 4-5 

over the users U of the quantity (1hs(w(U, T))). (c)Proxy , . . . ^ u • 

server Si repeats steps (a)-(b) for s^erid Ui^i obicts T 4 A complete graph G(C) is constructed whose verUces 

selected raiKlomly from cluster C, and averag^ the several designated core servers for cluster C. For eacb pair 

values of w(Si. T) thereby computed in step (b) to determine °! ^"^'^ '^'^ °J tiansmittmg a message be^een 

the desired quantity w(Si, C). which quantii represents the „ ^Z^Z n^^'^A "^'T ^^"^ '^"""''^ 

expected aggregate interest by the user base of proxy server tT^f'^^^^'' ?°T; °^ ^ "^1?^ 

Si in the target objects of cluster C to be this cost. The cost is determmed as a suitable function 

...... ... of average transmission charges, average transmission delay. 

In another vmation, where target profile mterest summa- worst-case or near-worst-case transmission delay, 
nes are embodied as search profile sets, the foUowing 5. The multicast tree MT(Q is computed by standard 
procedure B foUowed to compute w(Si, C> (a). For each 40 methods to be the minimum spanning tree (or a near- 
search profile P^ m the locally stored search profile set of minimum spamiing tree) for G(C), where the weight of an 
any user m the user base of proxy server Si. proxy server Si edge between two core servers is taken to be the cost of 
computes the distMce d(P^. P^) between the search profile transmitting a message between those two core servers. Note 
and the cluster profile P^ of cluster C. (b). w(Si,Q is chosen ,i,at MTfO does not contain as vertices all proxy servers 
to be the maximum value of (-d(P^.Pc)/r) across all such 4s SI . . . Sn. but only the core servers for chister C 
search profiles P^ where r is computed as an afSne fimction a message M is formed describing the duster profile 
ot the cluster diameter of cluster C. The stope and/or for cluster C, the core servers for cluster C and the toooloev 
mtercept of this afBne function are chosen to be smaller of the multicast tree MT(C) constructed on those core 
(thereby incieasmg w(Si, C)) for servers Si for which the servers. Message M is broadcast to aU proxy servers SI . . . 
target o^ect provider wishes to improve performance, as 50 So by means of the general multicast tree MT^„. Each proxy 
may be the case if the users m the.user base of pro^ server server Si, upon receipt of message M, extracts the cluster 
Si pay a premium for improved performance, or if perfor- profile of cluster C, and stores it on a local storage device, 
manoe at Si wiU otherwise be unacceptably low due to slow together with certain other information that it determines 
network connectioiK. ^om message M, as follows. If proxy server Si is named in 
In another variation, the proxy server Si is modified so 55 message M as a core server for cluster C, then proxy server 
that it maintains not only Urget profile interest summaries Si extracts and stores the subtree of MT{C) induced by all 
for each user in its user base, but also a single aggregate core servers whose path distance &om Si in the graph MT(C) 
target profile interest summary for the entire user base. This is less than or equal to d, where d is a constant positive 
aggregate target profile interest summary is determined in integer (usually from 1 to 3). If message M does not name 
the usual way firom relevance feedback, but the relevance 60 proxy server Si as a core server for MT(Q, then proxy server 
feedback 00 a target object, in this case, is considered to be Si extracts and stores a list of one or more nearby core 
the firequency with which users in the user base retrieved the servers that can be inexpensively contacted by proxy server 
target object when it was new. Whenever a user retrieves a Si over virtual point-to-point links, 
target object by means of a request to proxy server Si, the In the network of FIG. 3, to illustrate the use of trees, as 
aggregate target profile interest summary for proxy server Si 65 applied to the system of the present invention, consider the 
is updated. In this variation, w(Si, C) I s estimated by the following simple example where it is assumed that client r 
following steps: provides on-line information for the network, such as an 
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electronic newspaper. This information can be structured by in most circumstances, provided that d>l; higher values of 
cHent r into a prearranged form, comprising a number of d provide additional insurance against unreachable core 

files, each of which is associated with a different target servers, 

object. In the case of an electronic newspaper, the files can Multicaslini Files 

contain textual representations of stock prices, weather 5 The system for customized electronic information of 

forecasts, editorials, etc. The system determines likely desirable objects executes the following steps in order to 

demand for the target objects associated with these files in introduce a new target object into the system. These steps are 

order to optimize the distribution of the files through the initiated by an entity E, which may be either a user entering 

networic N of interconnected chents p-s and proxy servers commands via a keyboard at a chent processor q, as iUus- 

A-D. Assume that cluster C consists of text articles relating lo ^^^^ ^' ^ automatic software process resident on 

to the aerospace industry; further assume that the target a client or server processor q. 1. Processor q fonns a s^^ 

profile inters summaries stored at proxy servers A and B ^ *^ '^^^ ^° t'""^- ' '/u^ 

for the users at clients p and r indicL that these users are IZ!"" ''^'T f'V"^; ^^"^^ ^ mamtamed by 

. 1 . ^ , , . u 1 T- »^ «iv chent q on storage at chent q or on storage accessible by 

strongly interested m such articles. Then the proxy servers A ^^e network, con^ns the infoiiiational content 

^.^""^iT !^ "^."^ n^^lt^t tree is ^f or an identifying description of a target object, as 

MT(C). The multicast tree MT(C) is then computed to described above. The request R also includes an address at 

consist of the core servers, A and B, connected by an edge which entity E may be contacted (possibly a pseudonymous 

that represents the least'costly virtual point-to-point link address at some proxy server D); and asks the rcreiver to " 

between A and B (either the direct path A-B or the indirect store the fact that file F is mainUined by an entity at said 

path A-C-B, depending on the cost). 20 address. 2. Processor q embeds request R in a message Ml, 

Global Requests to Multicast Trees which it pseudonymously transmits to the entity E's proxy 

One type of message that may be transmitted to any proxy server D as described above. Message Ml irKtructs proxy 

server S is termed a "global request message." Such a server D to broadcast request R along an appropriate mul- 

message M triggers the broadcast of an embedded request R ticast tree. 3. Upon receipt of message Ml, proxy server D 

to all core servers in a multicast tree MT(C). The content of 25 examines the doubly embedded file F and computes a target 

request R and the identity of cluster C are included in the profile P for the corresponding target object. It compares the 

message M, as is a field indicating that message M is a target profile P to each of the cluster profiles for topical 

global request message. In addition, the message M contains clusters CI . . . Cp described above, and chooses Ck to be 

a field S;^ which is unspecified except under certain cir- the cluster with the smallest similarity distance to profile P. 

cumstances described below, when it names a specific core 30 4. Proxy server D sends itself a global request message M 

server. A global request message M may be transmitted to instructing itself to broadcast request R along the topical 

proxy server S by a user registered with proxy server S, multicast tree MT(Ck). 5, Proxy server D notifies entity E 

which transmission may take place along a pseudonymous through a pseudonymous communication that file F has been 

mix path, or it may be transmitted to proxy server S from multicast along the topical multicast tree for cluster Ck. 

another proxy server, along a virtual point-to-point oormec- 35 As a result of the procedure that server D and other 

hon. servers follow for acting on global request messages, step 4 

When a proxy server S receives a message M that is eventually causes all core servers for topic Ck to act on 

marked as a global request message, it acts as follows: 1. If request R and therefore store a local copy of file F. In order 

proxy server S is not a core server for topic C, it retrieves its to make room for file F on its local storage device, a core 

locally stored Ust of nearby core servers for topic C, selects 40 server Si may have to delete a less useful file. There are 

from this list a nearby core server S', and transmits a copy several ways to choose a file to delete. One option, well 

of message M over a virtual point-to-point connection to known in the art, is for Si to choose to delete the least 

core server S'. If this transmission fails, proxy server S recently accessed file. In another variation. Si deletes a file 

repeats the procedure with other core servers on its list. 2. If that it beheves few users will access. In this variation, 

proxy server S is a core server for topic C, it executes the 45 whenever a server Si stores a copy of a file F, it also 

following steps: (a) Act on the request R that is embedded computes and stores the weight w(Si, C^), where C^ is a 

in message M. (b) Set S,^ to be S(C) Retrieve the locally cluster consisting of the single target object associated with 

stored subtree of MT(C), and extract from it a list L of all file F. Then, when server Si needs to delete a file, it chooses 

core servers that are directly linked to S^^ in this subtree. to delete the file F with the lowest weight w(Si, C^r). To 

(d) If the message M specifies a value for S,^, and S/„, so reflect the fact that files are accessed less as they age, server 

appears on the list L, remove S,^, from the fist L. Note that Si periodically multipUes its stored value of w(Si, C^) by a 

fist L may be empty before this step, or may become empty decay factor, such as 0.95, for each file F that it then stores, 

as a result of this step, (e) For each server Si in list L, Alter natively, instead ofusing a decay factor, server Si may 

transmit a copy of message M from server S to server Si over periodically recompute aggregate interest w(Si, C^) for each 

a virtual point-to-point connection, where the S^ field of 55 file F that it stores; the aggregate interest changes over time 

the copy of message M has been altered to S^^ If Si cannot because target objects typically have. an age attribute that the 

be reached in a reasonable amount of time by any virtual system considers in estimating user interest, as described 

point-to-point connection (for example, server Si is broken), above. 

recurse to step (c) above with S^.^ bound to S^„^ and S^„^ If entity E later wishes to remove file F from the network, 

bound to S{\sub 1} for the duration of the recursion. 60 for example because it has just multicast an updated version. 

When server S' in step 1 or a server Si in step 2(e) receives it pseudonymously transmits a digitally signed global 

a copy of the global request message M, it acts according to request message to proxy server D, requesting all proxy 

exactly the same steps. As a result, all core servers eventu- servers in the multicast tree MT(Ck) to delete any local copy 

ally receive a copy of global request message M and act 00 of file F that they may be storing, 

the embedded request R, unless some core servers cannot be 6S Queries to Multicast Trees 

reached. Even if a core server is unreachable, step (e) In addition to global request messages, another type of 

ensures that the broadcast can continue to other core servers message that may be transmitted to any proxy server S is 
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termed a "query message." When transmitted to a proxy which must identify files to the user by (name, muhicast 
server, a query message causes a reply to be sent to the topic) pair. 2. Processor q forms a query message M that 
originator of the message; this reply will contain an answer poses query Q to the multicast u^e MT(C). 3. Processor q 
to a given query Q if any of the servers in a given multicast pseudonymously transmits message M to the user's proxy 
tree MT(C) are able to answer it, and will otherwise indicate 5 server D, as described above. 4. Processor q receives a 
that no answer is available. The query and the cluster C are response M2 to message M. 5. If the response M2 is 
named in the query message. In addition, the query message "positive," that is, it names a server S that still stores file F, 
contains a field Sj;^ which is unspecified except under then processor q pseudonymously instructs the user's proxy 
certam circumstances described below, when it names a server D to retrieve file F from server S. If the retrieval fails 
specific core server. When a proxy server S receives a lo because server S has deleted file F since it answered the 
message M that is marked as a query message, it acts as query, then client q returns to step 1. 6. If the response M2 
follows: 1. Proxy server S sets to be the return address for is "negative," that is, it indicates that no server in MT(C) still 
the client or server that transniitted message M to server S. stores file F, then processor q forms a query Q that asks the 
may be either a network address or a pseudonymous recipient for the address A of the entity that maintains file F; 
address 2. If proxy server S is not a core server for cluster is this entity will ordinarily maintain a copy of file F indefi- 
C, it retrieves its locally stored list ofnearby core servers fiDr nitely. All core servers in MT(C) ordinarily retain this 
topic C, selects from this list a nearby core server S', and information (unless instructed to delete it by the maintaining 
transmits a copy of the* locate message M over a virtual entity), even if they delete file F for space reasons, 
pomt-to-point connection to core server S'. If this transmis- Therefore, processor q should receive a response providing 
sion fails, proxy server S repeats the procedure with other 20 address A, whereupon processor q pseudonymously 
core servers on its hst. Upon receiving a reply, it forwards instructs the user's proxy server D to reuieve file F from 
this reply to address A^. 3. If proxy server S is a core server address A, 

for chister C, and it is able to answer query Q using locaUy When multiple versions of a file F exist on local servers 
stored information, then it transmits a "positive" reply to A^ throughout the data communication network N, but are not 
contairung the answer, 4. If proxy server S is a core server 25 marked as alternate versions of the same file, the system's 
for topic C, but it is unable to answer query Q using locaUy ability to rapidly locate files similar to F (by treating them 
stored mformation, then it carries out a parallel depth-first as Urget objects and applying the methods disclosed in 
search by executing the following steps: (a) Set L to be the "Searching for Target Objects" above) makes it possible to 
empty list, (b) Retrieve the locally stored subtree of MT(C). find aU the alternate versions, even if they are stored 
For each server Si direcUy linked to S,^ in this subtree, so remotely. These related data files may then be reconciled by 
other than S/„, (if specified), add the ordered pair (Si, S) to any method. In a simple instantiation, aU versions of the data 
the list L. (c) If L is empty, transmit a "negative" reply to file would be replaced with the version that had the latest 
address A^ saying that server S cannot locate an answer to date or version number. In another instantiation, each ver- 
query Q, and terminate the execution of step 4; otherwise sion wouW be automaticaUy annotated with references or 
proceed to step (d). (d) Select a list LI of one or more server 35 pointers to the other versions, 
pairs (Ai, Bi) from the list L For each server pair (Ai, Bi) 

on the list U. form a locate message M(Ai, Bi), which is a NEWS CUPPING SERVICE 

copy of message M whose S;^, field has been modified to The system for customized electronic identification of 
specify Bi, and transmit this message M(Ai. Bi) to server Ai desirable objects of the present invention can be used in the 
over a virtual point-to-point connection, (e) For each reply 40 electronic media system of HG. 1 to implement an auto- 
received^(by S) to a message sent in step (d), act as follows: matic news clipping service which leams to select (filter) 
(I) If a "positive" reply arrives to a locate message M(Ai, news articles to match a user's interests, based solely on 
Bi^, then forward this reply to A^ and tenminale step 4, which articles the user chooses to read. The system for 
immediately, (ii) If a "negative" reply arrives to a locate customized electronic identification of desirable objects 
inessage M(Ai, Bi), then remove the pair (Ai, Bi) from the 45 generates a Urget profile for each article that enters the 
list LI. (iii) If the message M(Ai, Bi) could not be success- electronic media system, based on the relative fi-equency of 
ftilly deHvered to Ai, then remove the pair (Ai, Bi) firom the occurrence of the words contained in the article. The system 
list U, and add the pair (Ci, Ai) to the list U for each Ci for customized electronic identification of desirable objects 
other than Bi that is directly linked to Ai in the locaUy stored also generates a search profile set for each user, as a function 
subtree of MT(q. (f) Once LI no longer contains any pair 50 of the target profiles of the articles the user has accessed and 
(Ai, Bi) for which a message M(Ai, Bi) has been sent, or the relevance feedback the user has provided on these 
after a fixed period of time has elapsed, return to step (c). articles. As new articles are received for storage on the mass 
Retrievir^ Files from a Multicast Tree storage systems SS^-SS^ of the information servers I^^^, 

When a processor q in the network wishes to retrieve the the system for customized electronic identification of desJ- 
file associated with a given target object, it executes the 55 able objects generates their target profiles. The generated 
following steps. These steps are initiated by an entity E, target profiles are later compared to the search profiles in the 
which may be either a user entering commands via a users' search profile sets, and those new articles whose 
keyboard at a client q, as ilhistrated in FIG, 3, or an target profiles are closest (most similar) to the closest search 
automatic software process resident on a client or server profile in a user's search profile set are identified to that user 
processor q. 1. Processor q forms a query Q that asks 60 for possible reading. The computer program providing the 
whether the recipient (a core server for cluster Q still stores articles to the user monitors how much the user reads (the 
a file F that was previously multicast to the multicast tree number of screens of data and the number of minutes spent 
MT(C); if so, the recipient server should reply with its own reading), and adjusts the search profiles in the user's search 
server name. Note that processor q must aheady know the profile set to more closely match what the user apparently 
name of file F and the identity of cluster Q typicaUy, this 65 prefers to read. The details ofthe method used by this system 
mformation is provided to entity E by a service such as the are disclosed in flow diagram form in FIG. 5. This method 
news clipping service or browsing system described below, requires selecting a specific method of calculating uscr- 
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Specific search profile sets, of measuring similarity between 
two profiles, and of updating a user's search profile set (or 
more generally target profile interest smmnary) based on 
what the user read, and the examples disclosed herein are 
examples of the many possible implementations that can be 
used and should not be construed to limit the scope of the 
system. 

Initialize Users' Search Profile Sets 

The news clipping service instantiates target profile inter- 
est summaries as search profile sets, so that a set of high- 
interest search profiles is stored for each user. The search 
profiles agncjajed with a gjvftniigf>r change over tirriO^s'in 
aiiy''appncaSoninvolvingsearc& profllesnhey can be ini- 
tially determined for a new user (or explicitly altered by an 
existing user) by any of a number of procedures, including 
the following preferred methods: (1) asking the user to 
specify search profiles directly by giving keywords and/or 
niimeric attributes, (2) using copies of the profiles of target 
objects or target clusters that the user indicates are repre- 
sentative of his or her interest, (3) using a standard set of 
search profiles copied or otherwise determined fi-om the 
search profile sets of people who are demographically 
similar to the user. 

Retrieve New Articles from Article Source 



Compare Cunent Articles' Target Profiles to a User's Search 
Profiles 

The process by which a user employs this apparatus to 
retrieve news articles of interest is illustrated in flow dia- 
gram form in FIG. 11. At step UOl, the user log;5 into the 
data communication network N via their client prcx;essor Q 
and activates the news reading program. This is accom- 
plished by the user establishing a pseudonymous data com- 
munications connection as described above to a proxy server 
10 S2, which provides front-end access to the data communi- 
cation network N. The proxy server Sj maintains a list of 
authorized pseudonyms and their corresponding public keys 
and provides access and billing control. The user has a 
search profile set stored in the local data storage medium on 
the proxy server Sj. When the user requests access to "news" 
at step 110(2, the profile matdiing module 203 resident on 
proxy server S2 sequentially considers each search profile pj^ 
from the tiser's search profile set to determine which news 
articles are most likely of interest to the user The news 
20 articles were automatically clustered into a hierarchical 
cluster tree at an earlier step so that the determination can be 
made rapidly for each user. The hierarchical cluster tree 
serves as a decision tree for determining which articles* 
target profiles are most similar to search profile the 



Articles are available on-line firom a wide variety of 25 search for relevant articles begins at the top of the tree, and 



sources. In the preferred embodiment, one would use the 
current days news as supplied by a news source, such as the 
AP or Reuters news wire. TTiese news articles are input to the 
electronic media system by being loaded into the mass 
storage system SS4 of an information server S^. Hie article 
profile module 201 of the system for customized electronic 
identification of desirable objects can reside on the infor- 
mation server S4 and operates pursuant to the steps illus- 
trated in the flow diagram of FIG. 5, where, as each article 



at each level of the tree the branch or branches arc selected 
which have cluster profiles closest to pj^ This process is 
recursively executed until the leaves of the tree are reached, 
identifying individual articles of interest to the user, as 
30 described in the section "Searching for Target Objects" 
above, 

A variation on this process exploits the fact that many 
users have similar interests. Rather than carry out steps 5-9 
of the above process separately for each search profile of 



is received at step 501 by the information server S4, the 35 each user, it is possible to achieve added efficiency by 



article profile module 201 at step 502 generates a target 
profile for the article and stores the target profile in an article 
indexing memory (typically part of mass storage system SS4 
for later use in selectively delivering articles to users. This 
method is equaUy useful for selecting which articles to read 
from electronic news groups and electronic bulletin boards, 
and can be used as part of a system for screening and 
organizing electronic mail ("e-mail"). 
Calculate Article Profiles 

A target profile is computed for each new article, as 
described earlier. The most important attribute of the target 
profile is a textual attribute that stands for the entire text of 
the article. This textual attribute is represented as described 
earlier, as a vector of numbers, vvhich numbers in the 
preferred embodiment include the relative frequencies (TF/ 
IDF scores) of word occurrences in this article relative to 
other comparable artides. The server must count the fre- 
quency of occunenoe of eadi word in the article in order to 
compute the TF/IDF scores. 

These news articles are then hierarchically clustered in a 
hierarchical cluster tree at step 503, which serves as a 
decision tree for determining \\%ich news articles are closest 
to the user's interest. The resulting clusters can be viewed as 
a tree in which the top of the tree includes all target objects 
and branches further down the tree represent divisions of the 
set of target objects into successively smaller subclusters of 
target objects. Each cluster has a cluster profile, so that at 
each node of the tree, the average target profile (centroid) of 
all target objects stored in the subtree rooted at that node is 
stored. This average of target profiles is computed over the 
representation of targgt-profil ^ as vectors of n umeric 
attributes, as described above. 
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carrying out these steps only once for each group of similar 
search profiles, thereby satisfying many lisers' needs at 
once. In this variation, the system begins by noo- 
hierarchically clustering all the search profiles in the search 
profile sets of a large number of users. For each chister k of 
search profiles, with cluster profile pj^, it uses the method 
described in the section "Searching for Target Objects" to 
locate articles with target profiles similar to pj^ Each located 
article is then identified as of interest to eadi user who has 
a search profile represented in cluster k of search profiles. 

Notice that the above variation attempts to match clusters 
of search profiles with similar clusters of articles. Since this 
is a symmetrical problem, it may instead be given a sym- 
metrical solution, as the following more general variation 
shows. At some point before the matching process 
commences, aU the news articles to be considered are 
clustered into a hierarchical tree, termed the "target profile 
cluster tree " and the search profiles of all users to be 
considered are clustered into a second hierarchical tree, 
termed the "search profile cluster tree." The following steps 
serve to find all matches between individual target profiles 
fi"om any target profile cluster tree and individual search 
profiles fi-om any search profile chister tree: 1. For each child 
subtree S of the root of the search profile cluster tree (or, let 
S be the entire search profile cluster tree if it contains only 
one search profile): 2. Compute the cluster profile P^ to be 
the average of all search profiles in subtree S 3. For each 
subcluster (child subtree) T of the root of the target profile 
cluster tree (or, let T be the entire target profile cluster tree 
if it contains only one target profile): 4. Compute the cluster 
profile Pj. to be the average of all target profiles in subtree 
T 5. Calculate the distance between P^ and Py 6. If 
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d(Pj, P7-)<t, a threshold, 7. If S contains only one search 
profile and T contains only one target profile, declare a 
match between that search profile and that target profile, 8. 
otherwise recurse to step 1 to find all matches between 
search profiles in tree S and target profiles in tree T. 

The threshold used in step 6 is typically an afiBne fiinction 
or other fiinction of the greater of the cluster variances (or 
cltister diameters) of S and T. Whenever a match is declared 
between a search profile and a target profile, the target object 
that contributed the target profile is identified as being of 
interest to the user who contributed the search profile. Notice 
that the process can be applied even when the set of users to 
be considered or the set of target objects to be corsidered is 
very small. In the case of a single user, the process reduces 
to Ihe method given for identifying articles of interest to a 
single user. In the case of a single' target object, the process 
constitutes a method for identifying users to whom that 
target object is of interest. 
Present List of Articles to User 

Once the profile correlation step is completed for a-i 
selected user or group of users, at step 1104 the profile 
processing module 203 stores a list of tbe identified articles 
for presentation to each user. At a user's request, the profile 
processing system 203 retrieves the generated list of relevant 
articles and presents this list of titles of the selected articles 
to the user, who can then select at step 1105 any article for 
viewing. (If no titles are available, then the first sentence(s) 
of each article can be used.) The list of article tides is sorted 
according to the degree of similarity of the article's target 
profile to the most similar search profile in the user's search 
profile set. The resulting sorted list is either transmitted in 
real time to the user client processor C^, if the user is present 
at their client processor C^, or can be transmitted to a user's 
mailbox, resident on the user's client processor C-j or stored 
within the server for later retrieval by the user; other 
methods of transmission include facsimile transmission of 
the printed list or telephone transmission by means of a 
text-to-speecb system. The user can then tran;^it a request 
by computer, facsimile, or telephone to indicate which of the 
identified articles the user wishes to review, if any. The user 
can still access all articles in any information server S4 to 
which the user has authorized access, however, those lower 
on the generated list are simply further from the user's 
interests, as determined by the user's search profile set. The 
server retrieves the article from the local data storage 
medium or from an information server and presents the 
article one screen at a time to the user's client processor C^. 
The user can at any time select another article for reading or 
exit the process. 

Monitor Which Articles Arc Read 

The user's search profile set generator 202 at step 1107 
monitors which articles the user reads, keeping track of how 
many pages of text are viewed by tbe user, how much time 
is spent viewing the article, and whether all pages of the 
article were viewed. This information can be combined to 
measure the depth of the user's interest in the article, 
yielding, a passive relevance feedback score, as described 
earlier. Although the exact details depend on the length and 
nature of the articles being searched, a typical formula might 
be: measure of article attractiveness«0.2 if the second page 
is accessed +0.2 if all pages are accessed 40.2 if more than 
30 seconds was spent on the article 40.2 if more than one 
minute was spent on the article 4-0.2 if the minutes ^ent in 
the article are greater than half the number of pages. 

The computed measure of article attractiveness can then 
be used as a weighting function to adjust the user's search 
profile set to thereby more accurately reflect the_user*s 
dynamically changing interests. 
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Update User Profiles 
fijf^ Updating of a user's generated search profile set can be 
-^done at step 1108 using the method dcscr&cd in copending 
U.S. patent application Ser. No. 08/346,425. When an article 
5 is read, the server shifts each search profile in the set 
slightly in the direction of the target profiles of those nearby 
articles for which the computed measure of article attrac- 
tiveness was high. Given a search profile with attributes u^ 
from a user's search profile set, and a set of J articles 
available with attributes d^j^ (assumed correct for now), 
where I indexes users, j indexes articles, and k indexes 
attributes, user I would be predicted to pick a set of P distinct 
articles to minimize tbe sum of d(uj, by) over the chosen 
articles j. The user's desired attributes u^ and an article's 
attributes d^^ would be some form of word frequencies such 
asTF/IDF and potentially other attributes such as the source, 
reading level, and length of the article, while d(u/, dy) is the 
distance between these two attribute vectors (profiles) using 
the similarity measure described above. If the user picks a 
different set of P articles than was predicted, the user search 
profile set generation module should try to adjust u and/or d 
to more accurately predict the articles the user selected. In 
particular, u^ and/or dy should be shifted to increase their 
similarity if user I was predicted not to select article j but did 
select it, and perhaps also to decrease their similarity if user 
I was predicted to select article j but did not. A preferred 
method is to shift u for each wrong prediction that user I will 
not select article j, using the formula: u^'=u^-e(Ui^ d^,^ 

Here Uy is cbosen to be the search profile from user Ts 
search profile set that is closest to. target profile. If e is 
positive, this adjustment increases the match between user 
I's search profile set and the target profiles of the articles 
user I actually selects, by making u^ closer to dy for the case 
where tbe algorithm failed to predict an article that the 
viewer selected. The size of e determines how many 
example articles one must see to change the search profile 
substantially. If e is too large, the algorithm becomes 
unstable, but for sufficiently small e, it drives u to its correct 
value. In general, e should be proportional to the measure of 
article attractiveness; for example, it should be relatively 
high if user I spends a long time reading article j. One could 
in theory also use the above formula to decrease the match 
in tbe case where the algorithm predicted an article that the 
user did not read, by making e negative in that case. 
However, there is no guarantee that u will move in the 
correct direction in that case. One can also shift the attribute 
weights Wj of user I by using a similar algorithm: W£^*=(w^- 
cfu^-J)/Sj. (w^-e|u^-dyj) This is particulariy important if 
one is combining word fi^quencies with other attributes. As 
before, this increases the match if e is positive — ^for the case 
where the algorithm failed to predict an article that the user 
read, this time by decreasing the weights on those charac- 
teristics for which the user's target profile u^ differs from the 
article's profile dy. Again, the size of e determines how many 
example articles one must see to replace what was originally 
believed. Unlike the procedure for adjusting u, one also 
make use of the fact that the above algorithm decreases the 
match if e is negative — ^fbr the case where the algorithm 
predicted an article that the user did not read. The derwmi- 
nator of the expression prevents weights from shrinking to 
zero over time by renormalizing the modified weights w/ so 
that they sum to one. Both u and w can be adjusted for each 
article accessed. When e is small, as it should be, there is no 
conflict between the two parts of the algorithm. The selected 
user's search profile set is updated at step 1108. 
Further AppUcations of the Filtering Technology 

The news clipping service may deliver news articles (or 
advertisements and coupons for purchasables) to off-line 
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users as well as to users who are on-line. Although the advertisements for that product, and a consumer who buys 
ofiF-line users may have no way of providing relevance a product apparently because of a particular advertisement 
feedback, the user profile of an off-line user U may be (for example, by using a coupon clipped from that 
similar to the profiles of on-line users, for example because advertisement) is deemed to have provided particularly high 
user U is demographically similar to these other users, and 5 relevance feedback on that advertisement. Such feedback 
the level of user U's interest in particular target objects can may be communicated to a proxy server by the consumer's 
therefore be estimated via the general interest-estimation client processor (if the consumer is making the purchase 
methods described earlier. In one application, the news electronically), by the retail vendor, or by the credit-card 
clipping service chooses a set of news articles (respectively, reader (at the vendor's establishment) that the consumer 
advertisements and coupons) that are predicted to be of lO uses to pay for the purchase. Given a database of such 
interest to user U, thereby determining the content of a relevance feedback, the disclosed technology is then used to 
customized newspaper (respectively, advertising/coupon match advertisements with those users who are most inter- 
circular) that may be printed and physically sent to user U ested in them; advertisements selected for a user are pre- 
via other methods. In general, the target objects included in sented to that user by any one of several means, including 
the printed document delivered to user U are those with the is electronic mail, automatic display on the users screen, or 
highest median predicted interest among a group G of users, printing them on a printer at a retail establishment where the 
where group G consists of either the single off-line user U, consxuner is paying for a purchase. The threshold distance 
a set of off-line users who are demographically similar to used to identify interest may be increased for a particular 
user U, or a set of off-line users who are in the same advertisement, causing the system to present that advertise- 
geographic area and thus on the same newspaper delivery 20 ment to more users, in accordance with the amount that the 
route. In a variation, user group G is clustered into several advertiser is willing to pay. 

subgroups Gl . . . Gk; an average user profile Pi is created A further use of the capabilities of this system is to 
from eadi subgroup Gi; for each article T and each user manage a user's investment portfolio. Instead of recom- 
profile Pi, the interest in T by a hypothetical user with user mending articles to the user, the system recommends target 
profile Pi is predicted, and the interest of article T to group 25 objects that are investments. As illustrated above by the 
G is taken to be the maximum interest in article T by any of example of stock market investments, many different 
these k hypothetical users; finally, the customized newspa- attributes can be used together to profile each investment, 
per for user group G is constructed fi-om those articles of The user's past investment behavior is characterized in the 
greatest mterest to group G. user's search profile set or target profile interest summary. 
The filtering technology of the news clipping service is 30 and this information is used to match the user with stock 
not limited to news articles provided by a single source, but opportunities (target objects) similar in nature to past invest- 
may be extended to articles or target objects collected from ments. The r^id profiling method described above may be 
any number of sources. For example, rather than identifying used to determine a rough set of preferences for new users, 
new news articles of interest, the technology may identify Quality attributes used in this system can include negatively 
new or updated World Wide Web pages of interest In a 35 weighted attibutes, such as a measurement of fluctuations in 
second ^plication, termed "broadcast clipping," where dividends historically paid by the investment, a quality 
individual users desire to broadcast messages to all inter- attribute that would have a strongly negative weight for a 
ested users* the pool of news articles is replaced by a pool conservative investor dependent on a regular flow of invcst- 
of messages to be broadcast, and these messages are sent to ment income. Furthermore, the user can set filter parameters 
the broadcast-clipping-service subscribers most interested in 40 so that the system can monitor stock prices and automati- 
them. In a third application, the system scans the transcripts cally take certain actions, such as placing buy or sell orders, 
of all real-time spoken or written discussions on the network or e-mailing or paging the user with a notification, when 
that are currenUy in progress and designated as public, and certain stock performance characteristics are met. Thus, the 
emplojB the newsr<;lipping technology to rapidly identify system can immediately notify the user when a selected 
discussions that the user may be interested in joining, or to 45 stock reaches a predetermined price, without the user having 
rapidly identify and notify users who may be interested in to monitor the stock market activity. The user's investments 
joining an ongoing discussion. In a fourth application, the can be profiled in part by a "type of investment" attribute (to 
system scans the transcripts of all real time spoken, written be used in conjunction with other attributes), which distin- 
or acoustic (e.g., audio or video streaming data) on the guishes among bonds, mutual fimds, growth stocks, income 
networic that are currently in progress, and employs news so stocks, etc., to thereby segment the user's portfolio accord- 
clipping technology to rapidly identify content which is ing to investment type. Each investment type can then be 
most appropriate for a particular advertisement or promotion managed to identify investment opportunities and the user 
that may pertaiii to the target object profile of the content can identify the desired ratio of investment capital for each 
presently occurring. In a fifth application, the method is used type, e.g., in accordance with the system's automatic rec- 
as a post-process that filters and ranks in order of interest the ss ommendation for relative distribution of investment capital 
many target objects found by a conventional database as indicated by the relative level of user interest for each 
search, such as a search for all homes selling for under type. 

$200,000 in a given area, for all 1994 news articles about In one application the system may also keep track of the 

Marcia Clark, or for all Italian-language films. In a sixth most relevant articles for the user who may receive recom- 

application, the method is used to filter and rank the links in 60 mendations also through notification (or paging for new 

a hypertext document by estimating the user's interest in the releases). In the previously described preferred 

document or other object associated with each link. In a implementation, the similarity of articles was described in 

seventh application, paying advertisers, who may be com- terms of the tendency of metrically similar users to read 

panics or individuals, are the source of advertisements or them where metric similarity of users is determined by the 

other messages, which take the place of the news articles in 6S tendency of those users to read similar articles wherein 

the news clipping service. A consumer who buys a product feedback from all of the users is considered. In this appli- 

is deemed to have provided positive relevance feedback on cation however, only those articles which tend to be read by 
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similar users which have a similar stock portfolio to that of L Automatically create a "customized Dewspaper". 
the user are instead considered similar. Accordingly, owners User profiling enabling custom recommendations may be 
of stocks which are metrically similar to certain articles are achieved by purely passive means of user activity data or if 
targeted with those articles. By applying similar techniques desired, it can refine and automate the selection process of 
in this application to those herein described, relevance 5 articles within user selected categories of interest as well as 
feedback determines the metric similarity of the associative recommend articles within different categories which the 
attributes whidi is each stock, with the relevant associative ^ likely to prefer as evidenced through past behaviors, 

attributes which are each article (or their associated textual. Applications include: 

descriptive or numeric attributes contained therein). Addi- W Presentation of new articles and corresponding adver- 
tionaUy in this regard, it is also possible to bias the weighting lo lisemenls which are of highest interest to the user, 
values of users providing relevance feedback to favor those W Recommending (highlighting) these articles from the 
who have invested in similar types of stocks and who have directory. 

a proven track record of success through their trading ^ ^ customized search engine which offers search results 
decisions. Another application for which this type of pre- tailored and relevancy ranked to user preferences, 

adjusted relevance feedback is useful in recommending is f ^sing a survey for off-line users for subsequent issues, an 
and/or automatically trading the most interesting stocks to inserted card inserted into each issue identifies or prioritizes 
users using the present methods above described, however, ^® interesting articles/ads. 

again biasing the relevance feedback to the system by those - ^^^^^ NotificaUon 

users who had been most successful in their past trading ^^^^ important and novel characteristic of the architec- 
dedsions with regards to those particular types of stocks. 20 ^ ^^^^^ ^ identify new or updated Urget objects 
Because financial advisors possess varying degrees f skUl relevant to the user, as determined by the user's 

which varies within different types of investments, such a ^^^^^ profile set or target profile interest summary, 
collaborative filtering based market for invesUnent need not ("Updated target objects" include revised versions of docu- 
be limited to stocks but to other types of investments as well. models of purchasable goods.) The system 

The market price for which this "expert advice" is purchased 25 ^® relevant target objects by an 

by would be investors, which have an infinity to investments electromc notification such as an e-mail message or fac- 
of the particular types that those advisor are experts in may transmission. In the variation where the system sends 

measured using the presently described techniques for deter- message, the user's e-mail filter can then respond 

mination of price point thus advice by a given expert for appropriately to the notification, for instance, by bringing 
investments which had demonstrated a given level of sue- 30 notification immediately to the user's personal attention, 
cess may be priced similarly. AdditionaUy, some gross level automaticaUy submitting an electronic request to 

feedback suggesting the advisors current awareness about PJ^chase the target object named in the notification. A 
investment types could be automatically assessed by pas- ^^V^^ example of the latter response is for the e-mail filter 
sively observed which articles within which investment ^ retrieve an on-line document at a nominal or zero charge, 
domains the user had been recently reading on-line In 35 ^quest to buy a purchasable of limited quantity such as 
accordance with the simUarity techniques previously ^ product or an auctionable. 
described, the user may browse between the genres of ACTIVE NAVIGAnON (BROWSING) 

articles and stocks which are most relevant to one another. Browsing by Navizating Through a Ouster Tree 
Because there are numerous systems and software tools A hierarchical cluster tree imposes a useful organization 
which are used in attempting to predict both selected stocks 40 on a collection of target objects. The tree is of direct use to 
and optimal times to buy or trade them, the current user a user who wishes to browse through all the target objects in 
customization techniques are best implemented as an the tree. Such a user may be exploring the collection with or 
enhancement feature to provide the user with not only without a well-specified goal. The tree's division of target 
quaUty but also personalization. objects into coherent clusters provides an efficient method 

In the preferred implementation for an on-line new^aper 45 whereby the user can locate a target object of interest. The 
or news filter, each of the above capabilities for customized user first chooses one of the highest level (largest) clusters 
recommendation and notification of investment related from a menu, and is presented with a menu listing the 
articles, stock recommendations and automated stock moni- subclusters of said cluster, whereupon the user may select 
toring and trading feamres are provided to the user as an one of these subclusters. Hie system locates the subclusters, 
integrated financial news and investment service. 50 via the appropriate pointer that was stored with the larger 
Additionally, in accordance with the virtual communities cluster, and allows the user to select one of its subclusters 
section below described, users sharing common portfolios from another menu. This process is repeated until the user 
may wish to corre^nd on-line to advice or experiences comes to a leaf of the tree, which yields the details of an 
with other similar users. Additionally, users who have a past actual target object. Hierarchical trees allow rapid selection 
track record of success may also be particularly identifiable ss of one Urget object from a large set In ten menu selections 
through these virtual communities in conjunction with their from menus of ten items (subclusters) each, one can reach 
participation or their comments and advice relating to spe- 10^°= 10,000,000,000 (ten billion) items. In the preferred 
cific stocks may be ascribed to those stocks, credentialed as embodiment, the user views the menus on a computer screen 
originating from an expert with a proven track record (and or terminal screen and selects from them with a keyboard or 
made pubficly available). 60 mouse. However, the user may also make selections over the 

telephone, with a voice synthesizer reading the menus and 
OTHER ON-LINE NEWSPAPER INTERFACE the user selecting subclusters via the telephone's touch-tone 

FEATURES keypad. In another variation, the user simultaneously main- 

tains two connections lo the server, a telephone voice 
In accordance with current on-line news interface 65 connection and a fax connection; the server sends successive 
features, several implementation features of the present menus to the user by fax, while the user selects choices via 
system include the foUowing: the telephone's touch-tone keypad. 
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Just as user profiles commonly include an associative Users' navigational patterns may provide some useful 

attribute indicating the user's degree of interest in each feedback as to the quality of the labels. In particular, if users 

target object, it is useful to augment user profiles with an often select a particular cluster to explore, but then quickly 

additional associative attribute indicating the user's degree backtrack and try a different cluster, this may signal Siat the 

aITJ^I"^ T f ^erarchical cluster tree. TOs 5 fiist cluster's label is misleading. Insofar as othSTterms and 

degree of mterest may be estimated numerically as the ,tf^-k..*„ ™ a 1* '"^'"^^ ^^^^^ 

number of subclusters or target objects the user has selected fi^^w^^ ^^'^^ f l,^'""""?:" ^'^'^ 

from menus associated with the given cluster or its ^L?! f . ' 1 "^^l^-*^ automaUcally 

subclusters, expressed as a proportion of the total number of f^*>^;^^^^ ^^^^^ misleadmg label In addition, any user can 

subclusters or target objects the user has selected This ^^^^^ ^ ^^^^^^ ^ convenience, 

associative attribute is particularly valuable if the hierarchi- Alti^oi^gh a cluster label provided by a user is in general 

cal tree was built using "sofr or "fiizzy" clustering, which ^ possible to make global use of 

allows a subclusters or target object to appear in inultiple a "user labels" texmal attribute for target 

clusters: if a target document appears in both the "sports" objects, which attribute is defined for a given target object to 

and the "humor" clusters, and the user selects it from a menu concatenation of all labels provided by any user for 

associated with the "humor" cluster, then the system cluster containing that target object This attribute 

increases its association between the user and the "humor" influences similarity judgments: for example, it may induce 

cluster but not its association between the user and the the system to regard target articles in a cluster often labeled 

"sports" cluster. * "Sports News" by users as being mildly similar to articles in 

Labeling Ousters an otherwise dissimilar cluster often labeled "International 

Since a user who is navigating the cluster tree is repeat- 20 News" by users, precisely because the "user labels" attribute 
edly expected to select one of several subclusters from a in each cluster profile is strongly associated with the term 
menu, these subclusters must be usefiiUy labeled (at step "News." The "user label" attribute is also used in the 
503), m such a way as to suggest their content to the human automatic generation of labels, just as other textual attributes 
user. It is straightforward to inchide some basic information are, so that if the user-generated labels for a cluster often 
about each subcluster in its label, such as the number of 25 include "Sports," the term "Sports" may be included in the 
target objects the subchister contains (possibly just 1) and automatically generated label as well 
the number of these that have been added or updated It is not necessary for menus to be displayed as simple 
recently. However, it is also necessary to display additional lists of labeled options; it is possible to display or print a 
information that indicates the cluster's content. This menu in a form that shows in more detail the relation of the 
content-descriptive information may be provided by a 30 different menu options to each other. Thus, in a variation, the 
human, particularly for large or frequenUy accessed clusters, menu options are visually laid out in two dimensions or in 
but it may also be generated automaticaUy. The basic a perspective drawing of three dimensions. Each option is 
automatic technique is simply to display the cluster's "char- displayed or printed as a textual or graphical label. The 
acteristic value" for each of a few highly weighted attributes. physical coordinates at which the options are displayed or 
AVith numeric attributes, this may be taken to mean the 35 printed are generated by the following sequence of steps: (1) 
cluster's average value for that attribute: thus, if the "year of construct for each option the cluster profile of the cluster it 
release" attribute is highly weighted in predicting which represents, (2) construct from each chister profile its dccom^ 
movies a user will like, then it is useful to display average position into a numeric vector, as described above. (3) apply 
year of release as part of each cluster's label. Thus the user singular value decomposition (SVD) to determine the set of 
sees that one cluster consists of movies that were released 40 two or three orthogonal linear axes along which these 
around 1962, while another consists of movies from around numeric vectors are most greatly differentiated, and (4) take 
1982. For short textual attributes, such as "title of movie" or the coordinates of each option to be the projected coordi- 
"titie of document," the system can display the attribute's nates of that option's numeric vector along said axes. Step 
value for the chister member (target object) whose profile is (3) may be varied to detOTiine a set of, say, 6 axes, so that 
most similar to the cluster's profile (the mean profile for all 45 step (4) lays out the options in a fi^imensional space; in this 
members of the cluster), for example, the tiUe of the most case the user may view the geometric projection of the 
typical movie in the cluster. For longer textual attributes, a 6-dimensional layout onto any plane passing through the 
useful technique is to select those terms for which the origin, and may rotate this viewing plane in order to see 
amount by which the term's average TF/IDF score across differing configurations of the options, which emphasize 
members of the cluster exceeds the term's average TF/IDF so similarity with respect to differing attributes in the profiles 
score across all tar get objects is greatest, either in absolute of the associated chisters. In the visual representation, the 
terms or else as a fraction of the standard deviation of the sizes of the cluster labels can be varied according to the 
term's TF/IDF score across all target objects. The selected number of objects contained in the corresponding clusters, 
terms are replaced with their morphological stems, elimi- In a further variation, all options from the parent menu are 
nating dupHcates (so that if both "slept" and "sleeping" were 55 displayed in some number of dimensions, as just described, 
selected, they would.be replaced by the single term "sleep") but with the option corresponding to the current menu 
and optionally eKminating close synonyms or collocates (so replaced by a more prominent subdisplay of the options on 
that if both "nurse" and "medical" were selected, they might the current menu; optionally, the scale of this composite 
both be replaced by a single term such as "nurse," display may be gradually increased over time, thereby 
"medical." "medicine." or "hospital"). The resulting set of 60 increasing the area of the screen devoted to showing the 
terms is displayed as part of the label. Finally, if freely options on Uie current menu, and.giving the visual impres- 
redistributable thumbnail photographs or other graphical sion tiiat the user is regarding the parent cluster and "zoom- 
images are associated with some of the target objects in the ing in" on the current cluster and its subclusters. 
cluster for labeling purposes, then the system can display as Further Navigational 

part of the label the image or images whose associated target 6S It should be appreciated that a hierarchical cluster-tree 

objects have target profiles most simUar to the cluster may be configured with multiple cluster selections branch- 

P^^^^- ing from each node or the same labeled clusters presented in 
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the form of single branches for multiple nodes ordered in a cliisters. For a user who is looking for something in 
hierarchy. In one variation, the user is able to perform lateral particular, it is generally less efficient to start at the largest 
navigation between neighboring clusters as well, by request- chister and repeatedly select smaller subclusters than it is to 
ing that the system search for a cluster whose cluster profile write a brief description of what one is looking for and then 
resembles the chistcr profile of the currently selected cluster. 5 to move to nearby clusters if the objects initially rccom- 
If this type of navigation is performed at the level of mended are not precisely those desired, 
individual objects (leaf ends), then automatic hyperlinks Although it is customary in information retrieval systems 
may be then created as navigation occurs. This is one way to match a query to a document, an interesting variation is 
■ that nearest neighbor clustenng navigation may be per- possible where a query is matched to an already answered 
formed. For example, in a domain where target objects are lO question. The relevant domain is a customer service center, 
home pages on the World Wde Web, a collection of such electronic newsgroup, or Better Business Bureau where 
pages could be laterally linked to create a "virtual mall", questions are frequently answered. Each new question- 
Most importantly, links to sites in the form of targeted answer pair is recorded for future reference as a target 
advertisements may be temporarily generated (as a result of object, with a textual attribute that specifies the question 
the user profile and the target object profile of the page being 15 together with the answer provided. As explained earlier with 
visited, the dialogue being conducted or the content being reference to document titles, the question should be 
viewed, listened to or read at that moment). This is one way weighted more heavily than the answer when this textual 
in which "oh the fly** automatic creation of customized links attribute is decomposed into'TF/IDF scbreis; A qiiery sped- 
may occur (user specific linking of advertisers with sites or fying "Tell me about the relation between Gafileo and the 
other content including progranmiing or joint ads or pro- 20 Medici family*' as t he value of this attribute therefore 
motions between advertisers may occur in real time). Or in locates a chister of similar questions together with their 
another period this technique maybe used to recommend the answers. In a variation, each question-answer pair may be 
most befitting sites and/or ads which should be linked profiled with two separate textual attributes, one for the 
together (based upon their similarity). Of course, certain question and one for the answer. A query might then locate 
promotions for example may be directly competitive such as 25 a cluster by specifying only, the question attribute, or for 
a product for two brands of toothpaste. Such direct com- completeness, both the question attribute and the (lower- 
petitive overiap must thus be accounted for. This tedinique weighted) answer attribute, to be the text "Tell me about the 
may also account for one way or two way (exchanged) links relation between Galileo and the Medici family." 
between vendors. Advertisers which exchange links or wish The filtering technology described earher can also aid the 
to link to a "prime location" ^ould pay a price which is 30 user in navigating among the target objects. When the 
directly in accordance with the market demand for that system presents the user with a menu of subclusters of a 
advertisement though not exceeding the price value neces- cluster C of target objects, it. can simultaneously present an 
sary to fill the available ad space. The techniques described additional menu of the most interesting target objects in 
in co-pending patent application entitled "PPS" suggests a cluster C, so that the user has the choice of accessing a 
method of automatically generating a customized motion (or 35 subcluster or directly accessing one of the target objects. If 
joint promotion) for individual users. A similar technique this additional menu lists n target (Ejects, then for each I 
may be used to automatically establish a price for the ad between 1 and n inclusive, in increasing order, the I''^ most 
space (based on a combined predicted price per impression prominent choice on this additional menu, which choice is 
and predicted value for the average customer expected to denoted Top(C,0, is found by considering all target objects 
access that advertisement. As feedback occurs, this pricing 40 in cluster C that are further than a threshold distance t from 
model is adjusted according to actual response feedback, all of Top(C,l), Top(C,2), . , . Top(C, I-l), and selecting the 
links may be broken, reformed in a one way or two way one in which the user's interest is estimated to be highest. If 
context in automatic fashion as such. the threshold distance I is 0, then the menu resulting from 
The simplest way to use the automatic menuing system this procedure simply displays the n most interesting objects 
described above is for the user to begin browsing at the top 45 in cluster C, but the threshold distance may be increased to 
of the tree and moving to more specific subclusters. achieve more variety in the target objects displayed. Gco- 
However, in a variation, the user optionally provides a query erally the threshold distance t is chosen to be an afi&ne 
consisting of textual and/or other attributes, from which function or other function of the cluster variance or cluster 
query the system constructs a profile in the manner diameter of the cluster C 

described herein, optionally altering textual attributes as 50 As a novelty feature, the user U can "masquerade" as 
described herein before decomposing them into numeric another user V, such as a prominent intellectual or a celebrity 
attributes. Query profiles are similar to the search profiles in supemodel; as long as user U is masquerading as user V, the 
a user's search profile set, except that their attributes are filtering technology will recommend articles not according 
explicitlyspecifiedby a user, most often for one-time usage, to user U's preferences, but rather according to user V's 
and unlike search profiles, they are not automatically 55 preferences. Provided that user U has access to the user- 
updated to reflect changing interests. A typical query in the specific data of user V, for example because user V has 
domain of text articles might have "Tell me about the leased these data to user U for a financial consideration, then 
relation between Galileo and the Medici family" as the value user U can masquerade as user V by instructing user U's 
of its "text of article" attribute, and 8 as the value of its proxy server S to temporarily substitute user V's user profile 
"reading difficulty" attribute (that is, 8th-grade level). The 60 and target profile interest summary for user U's. In a 
system uses the method of section "Searching for Target variation, user U has access to an average user profile and an 
Objects" above to automatically locate a small set of one or composite target profile interest sunmiary for a group G of 
more clusters with profiles similar to the query profile, for users; by instructing proxy server S to substitute these for 
example, the articles they contain are written at roughly an user U's user-specific data, user U can masquerade as a 
8th -grade level and tend to mention Galileo and the Medicis. 65 typical member of group G, as is useful in exploring group 
The user may start browsing at any of these clusters, and can preferences for sociological, political, or market research, 
move from it to subclusters, superchisters, and other nearby More generally, user U may "partially masquerade" as 
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another user V or group G, by instructing proxy server S to indicate a temporary interest that is added to his or her usual 

temporarily replace user U*s user-specific data with a interests. This is done by entering a query as described 

weighted average of user U's user-specific data and the above, Le., a set of teirtual and other attributes that closely 

user-specific data for user V and group G. match the user's interests of the moment . This query 

Menu Organization 5 becomes "active," and affects the system's determination of 

Although the topology of a hierarchical cluster tree is interest in either of two ways. In one approach, an active 
fixed by the techniques that build the tree, the hierarchical query is treated as if it were any other target object, and by 
menu presented to the user for the user's navigation need not virtue of being a query, it is taken to have received relevance 
be exactly isomorphic to the cluster tree. The menu is feedback that indicates especially high interest In an alter- 
typically a somewhat modified version of the cluster tree, lo native approach, target objects X whose target profiles are 
reorganized manually or automatically so that the clusters similar to an active query's profile are simply considered to 
most interesting to a user are easily accessible by the user. have higher quality q(U, X), in that q(U, X) is incremented 
In order to automatically reorganize the menu in a user- by a term that increases with target object X's similarity to 
specific way, the system first attempts automatically to the query profile. Either strategy affects the usual interest 
identify existing clusters that are of interest to the user. The is estimates: clusters that match user U's usual interests (and 
system may identify a cluster as interesting because the user have high quality q(*)) are still considered to be of interest, 
often accesses target objects in that cluster — or, in a more and clusters w hose profiles arc similar to an active query are 
sophisticated variation^' because the user is predicted to have adjudged to have especially high* interest Clusters that" are 
high interest in the cluster's profile, using the methods similar to both the query and the user's usual interests are 
disclosed herein for estimating interest from relevance feed- 20 most interesting of aU. The user may modify or deactivate an 
back. active query at any time while browsing. In addition, if the 

Several techniques can then be used to make interesting user discovers a target object or cluster X of particular 
clusters more easily accessible. The system can at the user's interest while browsing, he or she may replace or augment 
request or at all times display a special list of the most the original (perhaps vague) query profile with the target 
interesting clusters, or the most interesting subclusters of the 25 profile of target object or cluster X, t hereby amplifying or 
current cluster, so that the user can select one of these refining the original query to indicate an particular interest 
clusters based on its label and jimip directly to it In general, in objects similar to X, For example, suppose the user is 
when the system constructs a list of interesting clusters in browsing through documents, and specifies an initial query 
this way, the I** most prominent choice on the hst, which containing the word "Lloyd's," so that the system predicts 
choice is denoted Top(r), is found by considering all appro- 30 documents containing the word "Lloyd's" to be more inter- 
priate clusters C that are fairther than a threshold distance I esting and makes them more easily accessible, even to the 
from all of Top(l), Top(2), . . . Top(I-l), and selecting the point of listing such documents or cliisters of such 
one in which the user's interest is estimated to be highest. documents, as described above. In particular, certain articles 
Here the threshold distance t is optionally dependent on the about insurance containing the phrase "Lloyd's of London" 
computed cluster variance or cluster diameter of the profiles 35 are made more easily accessible, as are certain pieces of 
in the latter cluster. Several tedmiques that reorganize the Welsh fiction containing phrases like "Lloyd's father." The 
hierarchical menu tree arc also usefil. First, menus can be user browses while this query is active, and hits upon a 
reorganized so that the most interesting subcluster choices useful article describing the relation of Lloyd's of London to 
appear earliest on the menu, or are visually marked as other British insurance houses; by replacing or augmenting 
interesting; for example, their labels are displayed in a 40 the query with the full text of this article, the user can turn 
special color or type face, or are displayed together with a the attention of the system to other documents that resemble 
number or graphical image indicating the likely level of this article, such as documents about British insurance 
interest Second, interesting clusters can be moved to menus houses, rather than Welsh folk tales, 
higher in the tree, i.e., closer to the root of the tree, so that In a system where queries are used, it is useful to include 
they are easier to access if the user starts browsing at the root 45 in the target profiles an associative attribute that records the 
of the tree. Third, uninteresting clusters can be moved to associations between a target object and whatever terms are 
menus lower in the tree, to make room for interesting employed in queries used to find that target object. The 
clusters that are being moved higher. Fourth, clusters with an association score of target object X with a particular query- 
especially low interest score (representing active dislike) can term T is defined to be the mean relevance feedback on 
simply be suppressed from the menus; thus, a user with 50 target object X, averaged over just those accesses of target 
children may assign an extremely negative weight to the object X that were made \^iiile a query containing term T 
"vulgarity" attribute in the determination of q, so that vulgar was active, multiplied by the negated logarithm of term T's 
clusters and documents will not be available at all. As the global frequency in all queries. The effect of this associative 
interesting clusters and the documents in them migrate attribute is to increase the measured similarity of two 
toward the top of the tree, a customized tree develops that 55 documents if they are good responses to queries that contain 
can be more efiSciently navigated by the particular user. If the same terms. A further maneuver can be used to improve 
menus arc chosen so that each menu item is chosen with the accuracy of responses to a query: in the summation used 
approximately equal probability, then the expected number to determine the quahty q(U, )Q of a target object X, a term 
of choices the user has to make is minimized. If, for is included that is proportional to the sum of association 
example, a user frequently accessed target objects whose 60 scores between target object X and each tenn in the active 
profiles resembled the cluster profile of cluster (a, b, d) in query, if any, so that target objects that are closely associated 
FIG. 8 then the menu in FIG. 9 could be modified to show with terms in an active query are determined to have higher 
the structure illustrated in FIG. 10. quality and therefore higher interest for the user. To comple- 

In the variation where the general techniques disclosed ment the system's automatic reorganization of the hierar- 

herein for estimating a user's interest from relevance feed- 65 chical cluster tree, the user can be given the ability to 

back are used to identify interesting clusters, it is possible reorganize the tree manually, as he or she sees fit Any 

for a user U to supply "temporary relevance feedback" to changes are optionally saved on the user's local storage 
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device so that they will afifect the presentation of the tree in Network Context of the Browsing System 

future sessions. For eiample, the user can choose to move or The files associated with target objects are typically 

copy menu options to other menus, so that useful chisters distributed across a large number of diflferent servers Sl-^ 

can thereafter be chosen direcUy from the root menu of the and cUents Cl-Cn. Each file has been entered into the data 

tree or from other easily acce^d or topically appropriate 5 storage medium at some server or cHent in any one of a 

menus. In an other example, the user can se ect clusters C,, „„„K^r «f ™o„c. -^^i a- u * * 1- T • 

... C, listed on a particular menu M and choose t^ S ^ '^^^ T 
remove th4 clusters from the menu, replacing them on the ^^'^^^^ ^ 'Z^f" ^ transmission, automatic syn- 
menu with a single aggregate clustTr M' containing all the ^f'^ "^T^ another corn- 
target objects from clusters C,, Q, . . . In this case the f ""^^^ program. Whfle a system to enable users to efiBaenUy 
immediate subclusters of new cluster M' arc either takJn to ^''''^}^ ^j^*^ ^^^^ hierarchical cluster tree on 
be clusters Q, Cj, . . . Q themselves, or else, in a variation ^ cenlraUzed machine, greater efiSciency can be 
similar to the "scatter-gatber^ method, are automaticaUy achieved if the storage of the hierarchical cluster tree is 
computed by clustering the set of all the subclusters of distributed across many machines in the network- Each 
clusters C2, . . . Cj^ according to the similarity of the cluster C, including single-member clusters (target objects), 
cluster profiles of these subclusters. 15 is digitally represented by a file F, which is multicast to a 
Electronic Mall topical multicast tree MT(C1); here cluster CI is either 
In one application, the browsing techniques described cluster C itself or some supercluster of cluster C. In this way, 
above may be applied to a domain' where the target objects file F is stored at multiple servers, for redundancy. The file 
are purchasable goods. When shoppers look for goods to F that represents cluster C contains at least the following 
purchase over the Internet or other electronic media, it is 20 data: 

typically necessary to display thousands or tens of thousands 1. The cluster profile for cluster C, or data sufiScient to 

ofproducts in a fashion that helps consumers find the items reconstruct this cluster profile. 2. The number of target 

they are looking for. The current practice is to use hand- objects contained in cluster C. 3. A human-readable label for 

crafted menus and sub-menus in which similar items are cluster C, as described in section 'labeling Ousters" above, 

grouped together. It is possible to use the automated clus- 25 4. If the cluster is divided into subclusters, a list of pointers 

tering and browsing methods described above to more to files representing the subclusters. Each pointer is an 

cfiectively group and present the items. Purchasable items ordered pair containing naming, first, a file, and second, a 

can be hierarchically chistered using a plurality of different multicast tree or a specific server where that file is stored. 5. 

criteria. Usefid attributes for a purchasable item include but If the cluster consists of a single target object, a pointer to 

are not limited to a textual description and predefined 30 the file corresponding to that target object, 

category labels (if available), the unit price of the item, and The process by which a client machine can retrieve the file 

an associative attribute listing the users who have bought F from the multicast tree MT(C 1) is described above in 

this item in the past. Also useful is an associative attribute section "Retrieving Files from a Multicast Tree." Once it has 

indicating which other items are often bought on the same retrieved file F, the client can perform further tasks pertain- 

shopping "trip" as this item; items that are often bought on 35 ing to this cluster, such as displaying a labeled menu of 

the same trip will be judged similar with respect to this subclusters, from which the user may select subchisters for 

attribute, so tend to be grouped together. Retailers may be the client to retrieve next. 

interested in utilizing a similar technique for purposes of The advantage of this distributed implementation is three- 
predicting both the nature and relative quantity of items fold. First, the system can be scaled to larger cluster sizes 
which are likely to be popular to their particular chentele. 40 and numbers of target objects, since much more searching 
This prediction may be made by using aggregate purchasing and data retrieval can be carried out concurrently. Second, 
records as the search profile set from which a collection of the system is fault-tolerant in that partial matching can be 
target objects is recommended. Estimated customer demand achieved even if portions of the system are temporarily 
which is indicative of (relative) inventory quantity for each unavailable. It is important to note here the robustness due 
target object item is determined by measuring the cluster 45 to redundancy inherent in our design — data is replicated at 
variance of that item compared to another target object item tree sites so that even if a server is down, the data can be 
(which is in stock). located elsewhere. 

As described above, hierarchically clustering the purchas- The distributed hierarchical cluster tree can be created in 

able target objects results in a hierarchical menu system, in a distributed fashion, that is, with the participation of many 

which the target objects or clusters of target objects that 50 processors. Indeed, in most applications it should be recre- 

appear on each menu can be labeled by names or icons and ated from time to time, because as users interact with target 

displayed in a two-dimensional or three-dimensional menu objects, the associative attributes in the target profiles of the 

in which similar items are di^layed physically near each target objects change to reflect these interaaions; the sys- 

other or on the sarne graphically represented "shelf,** As tem's similarity measurements can therefore take these 

described above, this grouping occurs both at the level of 55 interactions into account when judging similarity, which 

specific items (such as standard size Ivory soap or large allows a more perspicuous cluster tree to be built The key 

Breck shampoo) and at the level of classes of items (such as technique is the following procedure for merging n disjoint 

so^ps and shampoos). When the user selects a class of items cluster trees, represented respectively by files Fl . . , Fn in 

(for instance, by clicking on it), then the more specific level distributed fashion as described above, into a combined 

of detail is displayed. It is neither necessary nor desirable to 60 cluster tree that contains all the target objects from all these 

limit each item to appearing in one group; customers are trees. The files Fl . . . Fn are described above, except that the 

more likely to find an object if it is in multiple categories. cluster labels are not included in the representation. The 

Non-purchasable objects such as artworic, advertisements, following steps are executed by a server SI, in response to 

and free samples may also be added to a display of pur- a request message from another server SO, which request 

chasable objects, if they are associated with (liked by) 65 message includes pointers to the files Fl . . . Fn. 1. Retrieve 

substantially the same users as are the purchasable objects in files Fl . . . Fn. 2. Let L and M be empty lists. 3. For each 

the display. file Fi from among Fl . . . Fn: 4. If file Fi contains pointers 
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to subcluster files, add these pointers to list L. 5. If file Fi VIRTUAL COMMU^aTIES AND THE VIRTUAL 

represents a single target object, add a pointer to file Fi to list ORGANIZATION 

L 6. For each pointer X on list L, retrieve the file that pointer Matching users for Virtual Communities on the Internet 

P points to and extract the cluster profile P(X) that this file Computer users frequently join other users for discussions 

stores. 7. Apply a clustering algorithm to group the pointers 5 on computer bulletin boards, newsgroups, mailing lists, and 

X on list L according to the distances between their respcc- real-time chat sessions over the computer networic, which 

live chister profiles P(X). 8. For each (nonempty) resulting ^ ^VP^^ with Internet Relay Chat (IRQ), spoken 

group C of pointers: 9. If C contains only one pointer, add Internet phone), or videoconferenced. These forums 

this pointer to list M; 10. otherwise, if C contains exacdy the herein termed "virtual communities." In current practice, 

same subclusteis pointers as does one of the files Fi from lo l^^^ community has a specified topic, and users 

among FI . . . Fn, then add a pointer to file Fi to list M: 11. ^^ycj cornmumties of mtercst by word of mouth or by 

otherwise: 12, Select an arbitrary server S2 on the network, "^T^, " ^"^^ cornmumties (typically hundreds or 

group C and choosmg PO^^^ to^ 13. Send a among those posted to the sel^ted visual communities, that 
request message to server S2 that mclud^ the subcluster 15 is, n^ade pubUcly available to members of those communi- 
pomters in group C and rcquesU server S2 to merge the ties. If they desire, they may also write additional messages 
corresponding subcluster trees. 14. Receive a response from and post them to the vimial communities of their choice. The 
server S2, containing a pomter to a file G that represents the existence of thousands of Internet buUetin boards (also 
merged tree. Add this pointer to list M. 15. For each file Fi termed newsgroi^js) and countless more Internet mailing 
from among FX . . . Fn: 16. If list M does not inchide a 20 lists and private bulletin board services (BBS's) demon- 
poiiiter to file Fi, send a message to the server or servers strates the very strong interest among members of the 
storing Fi instructing them to delete file Fi. 17. Create and electronic community in fomms for the discussion of ideas 
store a file F that represents a new chister, whose subchisters about almost any subject imaginable. Presently, virtual com- 
pointers are exactly the subcluster pointers on list M. 18. munity creation proceeds in a haphazard form, usually 
Send a reply message to server SO, which reply message 25 instigated by a single individual who decides that a topic is 
contains a pointer to file F and indicates that file F represents worthy of discussion. There are protocols on the Internet for 
the merged cluster tree. voting to determine whether a newsgroup should be created. 
With the help of the above procedure, and the multicast but there is a large hierarchy of newsgroups (which begin 
tree MT full that includes all proxy servers in the network, with the prefix "alt.") that do not follow this protocol, 
the distributed hierarchical cluster tree for a particular 30 The system for customized electronic identification of 
domain of target objects is constructed by merging many desirable objects described herein can of course fiinction as 
local hierarchical cluster trees, as follows. 1. One server S a browser for bulletin boards, where target objects are taken 
(preferably one with good connectivity) is elected from the to be bulletin boards, or subtopics of bulletin boards, and 
tree, 2. Server S sends itself a global request message that each target profile is the chister profile for a cluster of 
causes each proxy server in MT^, (that is., each proxy 35 documents posted on some bulletin board. Thus, a user can 
server in the network) to ask its clients for files for the cluster locate bulletin boards of interest by all the navigational 
tree. 3. The clients of each proxy server transmit to the proxy techniques described above, including browsing and query- 
server any files that they maintain, which files represent ing. However, this method only serves to locate existing 
target objects from the appropriate domain that should be virtual communities. Because people have varied and vary- 
added to the cluster tree. 4. Server S forms a request RI that, 40 ing complex interests, it is desirable to automatically locate 
upon receipt, will cause the recipient server SI to take the groups of people with common interests in order to form 
following actions: (a) Build a hierarchical cluster tree of all virtual communities. The Virtual Community Service (VCS) 
the files stored on server SI that are maintained by users in described below is a network-based agent that seeks out 
the user base of SI. These files correspond to target objects users of a network with common interests, dynamically 
from the appropriate domain. This cluster tree is typically 45 creates bulletin boards or electronic maihng lists for those 
stored entirely on SI, but may in principle be stored in a users, and introduces them to each other electronically via 
distributed fashion, (b) Wait until all servers to which the e-mail. It is useful to note that once virtual communities 
server SI has propagated request R have sent the recipient have been created by VCS, the other browsing and filtering 
reply messages containing pointers to cluster trees, (c) technologies described above can subsequently be used to 
Merge together the cluster tree created in step 5(a) and the 50 help a user locate particular virUxal communities (whether 
cluster trees supplied in step 5(b), by sending any server pre-existing or automatically generated by VCS); similarly, 
(such as SI itself) a message requesling such a merge, as since the messages sent lo a given virtual community may 
described above, (d) Upon receiving a reply to the message vary in interest and urgency fior a user who has joined that 
sent in (c), which reply includes a pointer to a file repre- community, these browsing and filtering technologies (such 
senting the merged chister tree, forward this reply to the 55 as the e-mail filter) can also be used to alert die user to urgent 
sender of request Rl, unless this is SI itself 5. Server S sends messages and to screen out uninteresting ones, 
itself a global request message that causes all servers in The functions of the Vutual Community Service are 
MT^/; to act on embedded request Rl. 6. Server S receives general functions that could be implemented on any network 
a reply to the message it sent in 5(c). This reply includes a ranging from an office network in a small company to the 
pointer to a file F that represents the completed hierarchical 60 World Wide Web or the Internet The four main steps in the 
cluster tree. Server S multicasts file F to all proxy servers in procedure are: 1. Scan postings to existing virtual commu- 
MT^„. Once the hierarchical cluster tree has been created as nities. 2. Identify groups of users with common interests. 3. 
above, server S can send additional messages through the Match users with virtual communities, creating new virtual 
cluster tree, to arrange that multicast trees MT(C) are created communities when necessary. 4. Continue to enroll addi- 
for sufficiently large clusters C, and that each file F is 65 tional users in the existing virtual communities, 
multicast to the tree MT(C), where C is the smallest cluster More generally, users may post messages to virtual com- 
cootaimng file F. munities pseudonymously, even employing different pseud- 
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onyms for different virtual communities. (Posts not employ- 
ing a pseudonymous mix path may, as usual, be considered 
to be posts employing a non-secure pseudonym, namely the 
user's true network address.) Therefore, the above steps may 
be expressed more generally as follows: 1. Scan pseudony- 5 
mous postings to existing virtual communities. 2. Identify 
groups of pseudonyms whose associated users have com- 
mon interests. 3. Match pseudonymous users with virtual 
communities, creating new virtual communities when nec- 
essary. 4. Continue to enroll additional pseudonymous users 
in the existing virtual communities. Each of these steps can ^° 
be carried out as described below. 
\%tual Organization 

E-mail Groupware on the Intranet 0ntranet applications) 

Another application of Virtual Communities is the appli- 
cation to virtual organizations. Organizations may use the 
above described techniques in accordance with their unique 
circumstances of intranet enabled communications involv- 
ing telephony, voice and video conferencing, voice mail 
groupware and e-mail. By enabling users to better 
communicate, route messages by matching users together 20 
with each other or filtering e-mail or voice message, the 
following viable applications apply to the techniques of the 
previously described technologies including matching users 
in virtual communities on the Internet and those described in 
the previous sections. 25 
E-mail Filter 

In addition to the news clipping service descritjed above, 
the system for customized electronic identification of desir- 
able objects functions in an e-mail environment in a similar 
but slightly different manner. The news clipping service 30 
selects and retrieves news information that would not oth- 
erwise reach its subscribers. But at the same time, large 
numbers of e-mail messages do reach users, having been 
generated and sent by humans or automatic programs. These 
users need an e-mail filter, which automatically processes 35 
the messages received. The necessary processing includes a 
determination of the action to be taken with each message, 
including, but not limited to: filing the message, notifying 
the user of receipt of a high priority message, automatically 
responding to a message. The e-mail filter system must not 40 
require too great an investment on the part of the user to 
leara and use, and the user must have confidence in the 
appropriateness of the actions automatically taken by the 
system. The same filter may be applied to voice mail 
messages or facsimile messages that have been converted 45 
into electronically stored text, whether automatically or at 
the user's request, via the use of w ell-known techniques for 
speech recognition or optical character recognition. 

The filtering problem can be defined as follows: a mes- 
sage processing fimction MPF(*) maps from a received 50 
message (document) to one or more of a set of actions. The 
actions, which may be quite ^dfic, may be either pre- 
defined or customized by the use r. Each action A has an 
appropriateness function (*,*) such that F^ (UJ)) retimis 
a real number, representing the appropriateness of selecting 55 
action A on behalf of user U when user U is in receipt of 
message D. For example, if D comes fi-om a credible source 
and is marked urgent, then discarding the message has a high 
cost to the user and has low appropriateness, so that F^^^^^^ 
(VJD) is small, whereas alerting the user of receipt of the 60 
message is hi^ly appropriate, so that F^^„ (}JJ>) is large. 
Given the determined appropriateness function, the function 
MPF(D) is used to automatically select the appropriate 
action or actions. As an example, the following set of actions 
might be useful: 55 

1. Urgently notify user of receipt of message and/or insert 
message higher in the queue indicating its priority. 
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2. Insert message into queue for user to read later 

3. Insert message into queue for user to read later, and 
suggest that user reply 

4. Insert message into queue for user to read later, and 
suggest that user forward it to individual R where 
individual R's profile indicates that the message is 
relevant to himn/her or suggest that the message be sent 
as a voice mail using text to speech or as a fax or e-mail. 
The message may also be in the form of voice mail or 
voice e-mail. 

5. Summarize message and insert summary into queue 

6. Forward message to user's secretary 

7. File message in directory X 

8. File message in directory Y 

9. Delete message (i.e., ignore message and do not save) 
and/or 

10. Notify sender that further messages on this subject are 
unwanted 

11. Provide a form auto request response that the sender 
of the e-mail (or voice mail) message will be ignored 
(and that it will be deleted). 

12. Send a form auto response to the sender of an e-mail 
message that the user is out of town where the identity 
(or user profile) determines the selection of the 
response message. 

13. Send a form auto response message to an individual 
to which the user does not want to directly reply to. 

14. Similarly provide an auto response voice mail mes- 
sage that is specific (or most relevant) to the identity of 
the caller. 

15. Suggest to the user to authorize a form auto request for 
deletion from a mailing list. Provide an automatic call 
screening function. 

16. Provide an automatic call screening function wherein 
depending on the caller's identity to determine whether 
to allow the call to pass through to the secretary or user 
or to prompt the user to indicate the nature/purpose of 
his/her call using a speech to text conversion module to 
automatically select the most appropriate auto response 
message, whether to forward the call to the user's 
secretary, forward the call directly to the user, or 
automatically page the user, or request that the user not 
call back where these determinations are made based 
upon the identity of the caller and/or the stated objec- 
tives of the call or automatically forward the call to 
another user whose profile is more relevant. In this 
scenario if the user so desires if the call is forwarded 
directly to the user or if the user is paged while the 
caller is holding or if upon the system's determination 
it is forwarded to the user's voice mail, the user may 
identify the caller and/or listen to his/her stated objec- 
tive of the call or automatically inform the caller based 
upon his/her identity and/or stated calling objective not 
to call back (where the voice mail option is not 
provided). 

17. Notify user periodically that message "x" requests and 
warrants a reply due to its urgency and remind users 
periodically. 

18. Automatically recommend to the user a mailing list of 
the most appropriate prospective recipients of a given 
outbound e-mail message. This list is determined by 
both the user's previous e-mail activities regarding 
those prospective recipients and their user profiles as 
weU. 
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19, Accordingly suggest to the user a mailing list or 
automatically forward incoming e-mail messages 
which have been received wherein the user is not the 
most appropriate recipient for that message (if appro- 
priate the forwarding party may also view the profile of 
the recommended recipient(s) prior to approving the 
recommendation). This system may also be used as an 
e-mail router for incoming e-mail or voice mail coming 
into an organization which occurs automatically or 
upon a human's approval. 
The above appropriateness functions may of course 
instead first be manually entered as if then rules which are 
techniques well known in the art Additionally, the auto- 
matically generated version of these rules (herein suggested) 
may be instead automatically written in which case the user 
may approve or rewrite a recommended ^ropriatcncss 
function (e.g., it may indicate that if the value of a specific 
word in the message exceeds value X perform appropriate- 
ness function Y). 

Additional applications of the present methods are con- 20 
ceivable. For example in the case of sending, forwarding (or 
reforwarding) message to users based upon appropriateness 
functions relating to the profiles of the message and pro- 
spective recipients, it is possible to use this technique to 
allow users to more efficiently submit queries for response 25 
by users within any intranet, an inter-oiganizational intranet 
(extranet) or the Internet. An example application of the 
scenario is as follows: 

1, A newbie submits a query by web or e-mail. 

2. The engine shows the user a few answers such that 
similar newbie, query, answer triples have been highly 
rated. (One kind of answer consists of nothing but the 
URL of a he^jful site!) 



latter case, the system might allow the newbie to edit 
the query first. The edited query would be included in 
the go-ahcad to the next c;q)crt.) 

10. If in step 5 or step 9 none of the (remaining) experts 
have indicated interest, within a reasonable time after 
the question was originally posed, then the system 
slowly offers the question to more experts (as in Step 
4), up to a reasonable limit, until it does get a bite, 

11. Any expert who received a request but ignored it gets 
a relevance feedback value of 0 for that query. Any 
expert who gave a go-ahead, but didn't get to answer, 
For choosing an expert, some interesting attributes of 
an expert are usual time to respond, length of response, 
count of technical term in response — since different 
users may have different sensitivities to these factors. 
Also the text of queries/list of queries they've 
answered, what clusters of newbies has rated them 
highly, etc. Finally, the set of terms in their explicit • 
declarations of interest, and in their responses: this 
helps chister them both with queries and with other 
experts. If we had a billing mechanism (which would 
probably require collaboration with AFL or someone, 
since its currently hard to collect from a user who only 
spends $l/month on queries), here would be a rough 
pricing model: When a question lands on your desk, it 
comes accompanied by an offer of payment. So the 
system looks for an expert, price pair such that newbie, 
query, expert, price, time-of-^ay id highly rated, mean- 
ing: 

this expert is likely to answer this question for this price 
at this time 

this newbie will be satisfied with the tradeoff between 



answer and price paid 

^ . ^ ^ t_ Thisoughtto work fine, in terms of getting offered prices to 

3. If the user finds these answers unsatisfectory, the engme actuate correcdy. It does mean that it's hard to lower your 

rates (m a particular area) once the system has decided 
you're expensive and stopped sending you queries, but there 
are ways around this. (e.g. you could always actively notify 
the system of your new approximate rates, either out of the 
4Q blue or when responding to a request. In addition, the system 
might e-mail inactive experts every so often, asking if they 
want to lower their rates, declare additional interests, be 
dropped from the rolls, etc.). There is also a firee-of-charge 
model, which is presumably the best way. to start. It mi^t 



takes note of this feedback. Then it gpes to plan B, and 
finds a few experts such that the newbie, query, expert, 
dme-of-day tuples have been highly rated. 

4. The system offers all of these experts the question by 
e-mail. 

5. First expert to indicate interest in the offer (by replying 
"y^s") gets a go-ahead from the system. 

6. Expert replies by sending an ariswer from the system. 
S/he may reply if further dialogue is needed — a con- 



versation can continue in this way indefinitely. Of 45 involve some or all of these elements: 



so 



55 



course, it all goes through the system, so it's all 
pseudonymous and logged. (Sometimes the correspon- 
dence may go off-topic. There should be a mechanism 
for dealing with this, so that rambling (or personal) 
discussion won't appear in it's entirety as part of that 
database. E.g., If I want to go off-topic with my next 
message itself The system then forwards the message 
as usual, but with my real return address as the Reply-to 
field. Further correspondence (if the other correspon- 
dent chooses to reply) then occurs with real names and 
outside the system.) 

8. The newbie rates the quality of the dialogue, as a 
precondition for being allowed to ask more questions, 
(the expert is allowed to rate it too, so that the system 
knows which questions the expert UKES to answer, 60 
not just which ones s/he WILL answer.) 

9. If the dialogue never took place, because some expert 
replied in step 5 but didn't continue to step 6 within a 
reasonable time, the system sends a go-ahead to the 
next most appropriate of the experts who indicated 65 
interest in step 5, It also does this if the newbie got an 
answer but said (in step 8) that it was unhelpful. (In the 



Get nice idealists to participate as experts, by advertising 
00 Usenet (and/or by actually seeding the database with 
Usenet postings from selected groups, so that people 
may be experts without knowing it). I think there are 
some people -who would participate freely given that 
only a few people have to see each question, so they 
won't get many — it would reduce Usenet traffic, where 
everyone has to see all the questions — the answer 
would be permanently on file, and they could sign it 
(good for visibility!) 

if they ignore the questions they'll just go away. 

Attract advertising. 

The benign kind of advertising: plugs in signs and on 
web sites 

The sleazy kind: A query about word processors or 
WordPerfect is highly likely to draw an on-file 
"expert" response touting Microsoft Word 

The semi-sleazy kind: the expert responses to the query 
are uncompromised (they're genuinely highly rated) 
but an Soft advert labeled such is attached 
(Apparently EBM bought the queries "Microsoft" 
and "gates" on Lycos!) 
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Use play money. By answering questions, yoxi can build document sender, date sent, document length, dale of last 
up credit that you can use to ask questions. However, document received from this sender, key words, list of other 
if you go too deeply into debt, you have to fork over addressees, etc It was disclosed above how to estimate an 
real money (or accept advertising). If you go well into interest function on profiled target objects, using relevance 
profit, you can cash in. One could imagine eventually 5 feedback together with measured similarities among target 
using this system as the seed of a VCS chat service, objects and among users. In the con text of the e-mail filter, 
where queries consisted of topics advertisements. the task is to estimate several appropriateness functions 
(We'd just have to allow new people to get added to (*.♦), one per action. This is handled with exacUy the same 
existing conversations.) It's also a good way for method as was used earlier to estimate the topical interest 
consultants, brokers, mechanics, etc., to advertise their lo function f(*,*). Relevance feedback in this case is provided 
expertise (remember that answers can be paid for by the user's observed actions over time: whenever user U 
on-line, or a negotiation taken off-line). And for the chooses action A on document D, either freely or by choos- 
same reason, I could all-too-easily imagine it replacing ing or confirming an action recommended by the system, 
1-900 phone sex numbers. (Hey-ratings, price and all!) this is taken to mean that the appropriateness of action A on 
This matching criteria includes interest attributes. 15 document D is high, particularly if the user takes this action 
Though this market model is useful for the above A immediately after seeing document D. A presumption of 
example it is readily applicable to any of the afore- no appropriateness (corresponding to the earlier prcsump- 
mentioned applicatioiis'(to retrieving information, tion of no interest) is used "so that action A' is considered* 
human experts, employers and employees, buyers and inappropriate on a document unless the user or similar users 
sellers, and may be applied likewise to any product, 20 have taken action A on this document or similar documents, 
commodity, share or interest that may be exchanged in In particular, if no similar document has been seen, no action 
an open market, e.g., stocks, commodities, insurance is considered e^edally appropriate, and the e-mail filter 
policies, products (bought and sold or bartered). asks the user to specif y the appropriate action or confirm that 
Domains of application for the Internet-wide market the action chosen by the e-mail filter is the appropriate one. 
system (such as legal counseling, medicine, 25 Thus, the e-mail filter learns to take particular actions on 
engineering, psychological/sociological services, com- c-mail messages that 3-have certain attributes or combina- 
puter sohitions) as well as more subjective domains tions of attributes. For example, messages from John Doe 
such as architectural design, product design, document that originate in the (212) area code may prompt the system 
authoring, landscaping, decor (personalized fashion to forward a copy by fax transmission to a given fax number, 
design) and cosmetics as well as informal solutions to 30 or to file the message in directory X on the user's client 
problems of individuals based on their unique life and processor. A variation allows active requests of this form 
professional experiences, and encoimlers. Additionally, from the user, such as a request that any message from John 
some experts may choose to use a filtering functionality Doe be forwarded to a desired fax number until further 
on their system with preset parameters such as the price notice. This active user input requires the use of a natural 
of a given task must meet a preset minimum to qualify. 35 language or frirm-based interface for which - specific corn- 
Notice that actions 8 and 9 in the sample list above are mands are associated with particular attributes and combi- 

designed to filter out messages that are undesirable to the nations of attributes. 

user or that are received firom undesirable sources, such as Scaiming 

pesky salespersons, by deleting the unwanted message and/ Using the technology described above, \^al Commu- 

or sending a reply that indicates that messages of this type 40 nity Service oonstanUy scans all the messages posted to all 

will not be read. The appropriateness functions must be the newsgroups and electronic mailing lists on a given 

tailored to describe the appropriateness of carrying out each network, and constructs a target profile for each message 

action given the target profile for a particular document, and found. The network can be the Internet, or a set of bulletin 

then a message processing function MPF can be found boards maintained by America Online, Prodigy, or 

which is in some sense optimal with reject to the appro- 45 CompuServe, or a smaller set of bulletin boards that might 

priateness function. One reasonable choice of MPF always be local to a single organization, for example a large 

picks the aaion with highest appropriateness, an d in cases company, a law firm, or a university. The scanning activity 

where multiple actions are highly appropriate and are also need not be confined to bulletin boards and mailing lists that 

compatible with each other, selects more than one action: for were created by Virtual Community Service, but may also be 

example, it may automatically reply to a message and also 50 used to scan the activity of communities that predate Virtual 

file the same ndessage in directory X, so that the value of Community Service or are otherwise created by means 

MPF(D) is the set\{reply, file in directory X\}. In cases outside tbe \^ual Community Service system, provided 

where the appropriateness of even the most appropriate that these oonununities are public or otherwise grant their 

action falls below a user-specified threshold, as should permission. 

h^en for messages of an unfamiliar type, the system asks 55 The target profile of each message includes textual 

the user for confirmation of the action(s) selected by MPF. attributes specifying the title and body text of the message. 

In addition, in cases where MPF selects one action over In the case of a spoken rather than written message, the latter 

another action that is nearly as appropriate, the system also attribute may b e computed firom the acoustic speech data by 

asks the user for confirmation: for example, mail should not using a speech recognition system. The target profile also 

be deleted if it is nearly as appropriate to let the user see it. 60 inchides an associative attribute fisting the author(s) and 

It is possible to write appropriateness functions manually, designated recipient(s) of the message, where the recipients 

but the time necessary and lack of user expertise render this may be individuals and/or entire virtual communities; if this 

solution impractical. The automatic training of this system is attribute is highly weighted, then the system tends to regard 

preferable, using the automatic user profiling system messages among the same set of people as being similar or 

described above. Each received document is viewed as a 65 related, even if the topical similarity of the messages is not 

target object whose profile includes such attributes as the clear from their content, as may happen when some of the 

entire text of the document (represented as TF/IDF scores), messages are very short. Other important attributes include 
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the fraction of the message that consists of quoted material objects determines a pre-community consisting of the pseud- 

from previous messages, as well as attributes that are onyms of users determined to be most interested in that 

generally useful in characterizing documents, such as the cluster (for example, users who have search profiles similar 

message's date, length, and reading level. to the cluster pro file), together with the pseudonym of the 
Mrtual Community Identification 5 user who requested formation of the virtual community. 

Next, Mrtual Community Service attempts to identify Matching Users with Communities 
groups of pseudonymous users with common interests. Once Virtual Community Service identifies a cluster C of 

These groups, herein termed "pre-communities." are repre- messages, users, search profiles, or target objects that deter- 

sentedassetsofpseudonyms. Whenever Virtual Community mines a pre-community M, it attempts to arrange for the 
Service identifies a pre-community, it will subsequently 10 members of this pre-community to have the chance to 

attempt to put the users in said pre-community in contact participate in a common virtual commtmity V. In many 

with each other, as described below. Each pre-community is cases, an existing virtual community V may suit the needs of 

said to be "determined" by a chister of messages, pseud- the pre-community M. Virtual Community Service first 

onymous users, search profiles, or target objects. attempts to find such an existing community V In the case 

In the usual method for determining pre-communitics, 15 where cluster C is a cluster of messages, V may be chosen 
\lrtual Community Service chisteis the messages that were to be any existing virtual community such that the cluster 
scanned and profiled in the above step, based on the simi- profile of cluster C is within a threshold distance of the mean 
larity of those messages' computed talrget profiles, thus profile of the set of messages recently posted to virtual 
automatically finding threads of discussion that show com- community V; in the case where cluster C is a cluster of 
inon interests among the users. Naturally, discussions in a 20 users, V may be chosen to be any existing virtual community 
single vimial community tend to show common interests; such that the cluster profile of cluster C is within a threshold 
however, this method uses all the texts from every available distance of the mean user profile of the active members of 
virtual community, including bulletin boards and electronic virtual conmiimity V; in the case where the cluster C is a 
mailing lists. Indeed, a user who wishes to initiate or join a cluster of search profiles, V may be chosen to be any existing 
discussion on some topic may send a "feeler message" on 25 virtual community such that the cluster profile of cluster C 
that topic to a special mailing list designated for feeler mess is within a threshold distance of the cluster profile of the 
ages; as a consequence of the scanning procedure described largest chister resulting firom chistering all the search pro- 
above, the feeler message is automatically grouped with any files of active members of virtual community V; and in the 
similarly profiled messages that have been sent to this case where the cluster C is a chister of one or more target 
special mailing list, to topical mailing lists, or to topical 30 objects chosen from a separate browsing or filtering system, 
bulletin boards. The clustering step employs "soft V may be chosen to be any existing virtual community 
clustering," in which a message may belong to multiple initiated in the same way from a cluster whose cluster profile 
clusters and hence to multiple virtual communities. Each in that other system is within a threshold distance of the 
cluster of messages that is found by Virtual Community cluster profile of cluster C. The threshold distance used in 
Service and that is of sufficient size (for example, 10-20 35 each case is optionally dependent on the cluster variance or 
different messages) determines a pre-community whose cluster diameter of the profile sets whose means are being 
members are the pseudonymous authors and recipients of compared. 

the messages in the cluster. More precisely, the pre- If no existing virtual community V meets these conditions 

community consists of the various pseudonyms under which and is also willing to accept all the users in pre-community 
the messages in the cluster were sent and received. 40 M as new members, then Virtual Community Service 

Alternative methods for determining a pre-community, attempts to create a new virtual community V, Regardless of 

which do not require the scanning step above, inchide the whether virtual community V is an existing community or a : 

following: 1. Pre-communities can be generated by grouping newly created community. Virtual Community Service 

together users who have similar interests of any sort, not sends an e-mail message to each pseudonym P in pre- 

mercly Individuals who have akeady written or received 45 community M whose associated user U does not already 

messages about similar topics. If the user profile associated belong to virtual community V (under pseudonym ?) and 

with each pseudonym indicates the user's interests, for has not previously turned down a request to join virtual 

example through an associative attribute that indicates the community V. The c-mail message informs user U of the 

documents or Web sites a user likes, then pseudonyms can existence of virtual community V, and provides instructions 

be clustered based on the similarity of their associated user so which user U may follow in order to join virtual community 

profiles, and each of the resulting clusters of pseudonyms V if desired; these instructions vary depending on whether 

determines a pre-community comprising the pseudonyms in virmal community V is an existing community or a new 

the cluster. 2. If each pseudonym has an associated search community. The message includes a credential, granted to 

profile set formed through participation in the news clipping pseudonym P, which credential must be presented by user U 

service described above, then all search profiles of all 55 upon joining the virmal community V, as proof that user U 

pseudonymous users can be clustered based on their was actually invited to join. If user U wishes to join virtual 

similarity, and each cluster of search profiles determines a community V under a different pseudonym Q, user U may 

pre-community whose members are the pseudonyms from first transfer the credential ftom pseudonym Pto pseudonym 

whose seardi profile sets the search profiles in the cluster are Q, as described above. The e-mail message further provides 

drawn. Such groups of people have been reading about the 60 an indication of the common interests of the community, for 

same topic (or, more generally, accessing similar target example by including a list of titles of messages recenUy 

objects) and so presumably share an interest. 3. If users sent to the community, or a charter or introductory message 

participate in a news clipping service or any other filtering provided by the community (if available), or a label gener- 

or browsing s>rstem for target objects, then an individual ated by the methods described above that identifies the 

user can pseudonymously request the formation of a virtual 65 content of the chister of messages, user profiles, search 

community to discuss a parricular cluster of one or more profiles, or target objects that was used to identify the 

target objects known to that system. This cluster of target pre-community M. 
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If \%tual Community Service must create a new commu- whether or not a virtual meeting (or transcript thereoO 
nity V, several methods are available for enabling the should be made accessible to each employee in the organi- 
members of the new community to communicate with each zation based upon access privileges to particular types of 

other. If the pre-community M is large, for example con- content granted in the past and other aspects of his/her 

laining more than 50 users, then Virtual Community Service 5 profile. This technique may be applied more generally as 

typically establishes either a multicast tree, as described well to augment access control to information by employees 

below, or a widely-distributed bulletin board, assigning a in the organization in general. 

name to the new bulletin board. If the pre-commimity M has In accordance with currently used methods, voice and fax 
fewer members, for example 2-50, \^al Community numbers may change dynamically in accordance with the 
Service typically establishes either a multicast tree, as lo user's physical location. Specifically users should first be 
described below, or an e-mail mailing list. If the new virtual matched according to their common interest in a type of 
community V was determined by a chister of messages, then application which can be jointly interacted with or jointly 
Mrtual Community Service kicks off the discussion by viewed passively (via PC or TV). Then, secondly, users 
distributing these messages to all members of virtual com- within such a common interest group may be further sub- 
munity V. In addition to bulletin boards and mailing lists, 15 divided into sub-communities according to more specific 
alternative for that can be created and in which virtual common interests which they share (such as sub- 
communities can gather include real-time typed or spoken communities) of real time conespondenls simultaneously 
conversations (or engagement or distributed ^multi-user * watching a popular program on television or according to 
applications including video games) over the computer content profile of the real time dialogues which the users are 
network and physical meetings any of which can be sched- 20 engaged in e.g., as they jointly navigate the World Wide 
uled by a partly automated process wherein Virtual Cbm- Web, view a video program or television debate or engage 
munity Service requests meeting time preferences firom all in a video game. Conversely where the forum is smaller 
members of the pre-community M and then notifies these and/or the objectives are more objectively identified, sub- 
individuals of an appropriate meeting time. interest groups may be irrelevant, for example, on-line 
For multi user applications, users may be matched 25 seminars, organizational meetings or board meetings in 
together who share a high level of interest in that application which relevant users whose presence or participation is 
or the particular type of content therein as with educational requested may be automatically scheduled (by a schediding 
software, entertainment applications or groupware (e.g., agent) in advance or the user may be notified or paged if 
intra-organizational) where users may participate remotely topical relevancy to the user's interest (or professional 
in an application. Any of these multi-user applications may 30 interest) profile is identified in real time by the VCS agent 
involve automatic calendaring (by a scheduling agent) for initially (or throughout the course of the meeting), 
the purpose of arranging a virtual session between users who Continued Enrollment 

share a common interest in the nature or content of the Even after creation of a new virtual community. Virtual 

application (e.g., a high speed action or suspense adventure Community Service continues to scan other virtual commu- 

yideo game) or for some applications (e.g., document edit- 35 nities for new messages whose target profiles are similar to 

ing groupware) users may sometimes require synchronous the community's cluster profile (average message profile), 

sessions or they may participate asynchronously. Copies of any such messages are sent to the new virtual 

Conversely, users who are currently engaged in a multi user community, and the pseudonymous authors of these 

session may allow the VCS agent to notify or page remote messages, as well as users who show high interest in reading 

users who may be interested in participating as in entertain- 40 such messages, are informed by Wtual Community Service 

ment type applications or whose presence (or contribution) (as for pre-community members, above) that they may want 

they feel is needed as with groupware used in an organiza- to join the community. Each such user can then decide 

tional or professional context (such as with on-line whether or not to join the community. In the case of Internet 

conferencing, whiteboarding, document editing, virtual cor- Relay Chat (IRC), if the target profile of messages in a real 

porate rneetinjgs, etc.). Matching together users in these 45 time dialog are (or become) similar to that of a user, VCS 

applications assumes that within the current session, pro- may also send an urgent e-mail message to such user 

spective participants share the same (or similar) application whereby the user may be automatically notified as soon as 

thus are profiled accordingly to the nature of the application, the dialog appears, if desired. 

the list of current participants and if relevant secondarily to With these facilities. Virtual Community Service provides 

the content of the interacting user's dialogs (such as text or 50 automatic creation of new virtual communities in any local 

voice chat). or wide- area network, as well as maintenance of all virtual 

Specifically users are likely to have a common interest in communities on the network, including those not created by 

the nature of an application which can be jointly (passively) \%tual Community Service. The core technology underly- 

interacted with or jointly viewed such as the content of the ing Ntoual Community Service is creating a search and 

docuinent beii^ edited, the profile of a video being viewed 55 clxistering mechanism that can find articles that are "similar" 

or a site being visited by a group of users collaboratively in that the users share interests. This is precisely what was 

navigating the WWW (or intra-organizational Web). A use- described above. One must be sure that Vutasl Community 

ful approach to advertising in a virtual chat room, confer- Service docs not bombard users with notices about commu- 

ence or multi-user application is using the current temporal nities in which they have no real interest. On a very small 

profile of the collaborative interaction as a target profile for 60 network a human could be "in the loop", scanning proposed 

which to target ads in real time and dynamically change the virtual communities and perhaps even giving them names, 

ad presentation as the topical relevance of the interaaion But on larger networks Virtual Community Service has to 

chariges, which is then viewed by all of the collaborative run in fully automatic mode, since it is likely to find a large 

participants simuluneously. In a variation using similar number of virtual communities, 

techniques to those used in the above e-mail filter section, 65 Delivering Messages to a Wtual Community 

one appropriateness function which the system could write Once a virtual community has been identified, it is 

could be recommending to a user (such as an employer) straightforward for Virtual Community Service to establish 
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a mailing list so that any member of the virtual community APPENDED COLLABORATIVE COMPUTING 

may distribute e-mail to all other members. Another method APPLICATIONS 
of distribution is to use a conventional network bulletin 1- Automatic Retrieval and Assembly of Work Groups 
board or newsgroup to distribute the messages to all servers A company often requires a team of skilled personnel 
in the network, where they can be accessed by any member 5 (whose qualifications are specifically suited to the task at 
of the virtual community. However, these simple methods do hand). For large corporations it is diSoScult to keep track of 
not take into accoimt cost and performance advantages skiUsetsof its own internal employees. Conversely for small 
which accrue from optimizing the construction of a multi- companies finding such skill outside is often essential. The 
cast tree to carry messages to the virtual community. Unlike present system may thus organize such groups accordingly, 
a newsgroup, a multicast tree distributes messages to only a ^° ^® purpose of the system is to emulate expert human 
selected set of senrers, and unlike an e-mail maiUng list it organizers of work teams and make recommendations as to 
docs so eflSciently. appropriately qualified team of available people for 

. . ,* , ^ ^M^^r^ *iand based upon the suted objectives and 

A separate multicast tree MT(V) is mamUmed for each ^qui^j tasks of a prospective project. Using the presently 
virtual commumty V. by use of the following four prooe- is described technique using relevance feedback it is possible 
dures. 1. To construct or reconstruct this multicast tree, the to match the profile of the project widi that of the available 
core servers for virtual community V are taken to be those pool of individuals. The organizer may wish to keep in mind 
proxy servers that serve at least one pseudonymous member a variety of considerations in selecting teams for example 
of virtual community V Then the multicast tree MT(V) is considering a variety of qualifications, psychographics and 
established via steps 4-^ in the section "Multicast Tree 20 attributes pertaining to the user's profile as developed from 
Coostrurtion Procedure" above. 2. When a new user joins his/her professional on-line activities and interactions. In 
virtual community V, which is an existing virtual view of the fact that some skill requirements exist in 
community, the user sends a message to the user's proxy overlapping disciplines, that the more diversity 
server S. If user's proxy server S is not already a core server (complementarity) of sidlls of its members, the greater the 
for V, then it is designated as a core server and is added to 25 likelihood of covering the (important) skill requirements 
the multicast tree MT(V), as foUows. If more than k servers adequately (suggesting that the greater the complementarity 
have been added since the last time the multicast tree MT(V) of attributes characterizing the users base of qualifications 
was rebuilt, where k is a function of the number of core and infonnation content interaction the more synergistic the 
servers already in the tree, then the entire tree is simply work process). Another consideration may be to find the 
rebuilt via steps A-6 in the section "Multicast Tree Con- 30 fewest number of individuals as possible who collectively 
struction Procedure" above. Otherwise, server S retrieves its cover the apparent skill requirements. Still another consid- 
locally stored list of nearby core servers for V, and diooses eration is to favor the reorganizing of groups which had 
a server SI. Server S sends a control message to SI, previously proved themselves by arriving at a successful 
indicating that it would like to be added to the multicast tree solution or product to a similar problem or task. Through out 
MT(V). Upon receipt ofthis message, server SI retrieves its 35 the work process certain sub-pn*lems may require tempo- 
locally stored subtree Gl of MT(V), and forms a new graph rary consultation with ^propriately qualified individuals 
G from Gl by removing all degrcc-1 vertices other than SI who are more qualified than members of the present team. 
Itself. ScrverSl transmitsgraph G t ©server S, which stores Each member of a virtual work group (whether intra- 
it as its locaUy stored subtree of MT(V). FinaUy. server S organizational or inter-organizational) maybe prescribed 
sends a message to itself and to all servers that are vertices 40 attributes by a superior such as credentials, observed skill 
of graph G, instructing these servers to modify their locally sets from past experiences and psychographics. This method 
stored subtrees of MT(V) by adding S as a vertex and adding may also be used to observe patterns relating to what types 
an edge between SI and S. 3. When a user at a client q of users are granted access to what type of informational 
wishes to send a message F to virtual community V, cHent content e.g., some members of a team may be made privy to 
q embeds message F -in a request R instructing the recipient 45 some information which others are not in accordance with 
to store message F locally, for a limited time, for access by the methods suggested above. The system may present 
member s of virtual community V. Request R inchides a recommendations which restrict what types of data can be 
crcdentid proving that the user is a member of virtual accessed by that user. Some restriction attributes may be 
community V or is otherwise entitled to post messages to explicit indicating documents containing which word 
virtual community V (for example is not "black marked" by 50 attributes a user is forbidden to access. For others the 
that or other virtual community members). Oient q then restriction may be based upon exphcit criteria for example 
broadcasts request R to aU core servers in the multicast tree including documents containing words which tend to 
MT(V), by means of a global request message transmitted to co-occur (are metricaUy close) to those expliciUy men- 
the user's proxy server as described above. The core servers tioned. Or relative attribute weighting values may be used as 
satisfy request R, provided that they can verify the included 55 thresholds for determining automatically user document 
credential. 4. hi order to retrieve a particular message sent to access and privileges. Another (appropriateness function) 
virtual conamunity V, a user U at client q initiates the steps based criteria which may be used as well is the similarity 
described in section "Retrieving Files from a Multicast measure between the document and user profiles. In this case 
Tree," above. If user U does not want to retrieve a particular it may be useful to automatically generate exphcit rules 
message, but rather wants to retrieve all new messages sent 60 which may present the user profile (with relative attribute 
to virtual community V, then user U pseudonymously weights) as well as that of the document to the authorized 
instructs its proxy server (which is a core server for V) to decision maker. Additionally, as suggested in the e-mail 
send it all messages that were multicast to MT(V) after a fiher section (above) a fully trained system may additionally 
certain date. In either case, user U must provide a credential automatically present the rules (appropriateness functions) 
proving user U to be a member of virtual community V, or 65 which it has written through passive training. Thus the user 
otherwise entitled to access messages on virtual community may again (in this case for automaticaUy determining docu- 
^* ment access privileges) approve the rules presented or 
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modify ihem accordingly. Documents may also in some 3. Monitoring Dialogues 

cases be retrievable in segments (which lack forbidden The above present methods may be used for retrieving 

terms) if the authorized credential granting party so allows. documents by organizations to determine the relevance of 

In the case where documents or document segments and the internal correspondence (e-mail, fax, telephony and 
corresponding individuak arc ascribed manual restriction 5 recorded physical dialogues) to the interests of the user as 

attributes, user restriction attributes may act as a restriction stated or exemplified. Thus all irrelevant correspondences 

for cither adding to or deleting from relevant documents (or filtered out. Relevant ones may accordingly be clustered 

segments) or prohibiting access altogether as suggested. (labeled) and organized into a hierarchical cluster menu tree 

These restrictions may be integrated with the document file industrial browsing as above described. For example, an 
such that authorization credentials may act as a decryption lo ^^P^^y^^ °»ay wish to "listen in" on certain types of 

key should the document be transmitted or conveyed else- correspondences with a particular client by a particular 

where (e.g., outside of the data base) enabling these access f '°Pl?yee (^^ P^o^e number and voice ID using Neural Net 

restrictions to apply to any users accessing that information n^i^f mL°'-H^^^,' P^^^^^^^Pf ' ^^'^ 

anywhere. As ^^ested, it can be appreciated that the T ^ K f ""'T"? correspondences. 

* ♦ u • ul 1 11 t' ""^ In one approach fax, e-mail and telephone communications 

rStW^rH"^-^ Tf^^ '^^.^ to '^^^^^^ ^^y^^ ^ "monitored and 

apphcations of the virtual dialogues (live or mdexed advised simUarly in order to enable the system to develop 

recorded) Le., matching users with vutual meetings and the aggregate profiles for a given employee for both outgoing 

above e-mail and telephony router m both cases wherein and incoming forms of each desired communication media 

users are granted attribute based privileges to access (or which is used for purposes of routing, 

denial for accessing) certain dialogues in accordance with 20 A supervisor may designate particular clusters to be 

their content. directly relevant, indirectly relevant or irrelevant to a given 

In one exemplary approach, a virtual work group is user's employment duties. For any employee , a summary 

assembled for engineering a product, in another authoring or report of his/her work profile may be automatically e-mailed 

editing a document, in another anrive at coqx)rate policy for periodically and/or notification made if, for example, a 

a particular need or unresolved issue or foi the purposes of 25 certain inelevant or indirectly relevant cluster exceeds a 

creating virtual breakout sessions within an on-line confer- certain threshold or it may notify the superior if '*unusual" 

ence (multi organizational) or corporate meeting. Many patterns arc detected or manually entered key word detection 

other examples are possible. may be used in certain instances. In this regard, the attribute 

2. Virtual Meetings of time may be usefuil in determining whether or not 

Particularly within large organizations, it is advantageous 30 irrelevant dialogues are occurring during scheduled work 
to disseminate company (inside) news and information to times. Ungth and frequency of the correspondences are 
those employees for whom the information is "valuable". additional useful attributes. Each cluster of interest may also 
Using the same basic profiling techniques (above). Virtual be broken down to reveal the fiiU profile of each associated 
dialogues (either physical meetings or entirely virtual corre^ndence. In an application variation the present tech- 
meetings, either e-mail or telephony based) may be auto- 35 nique may be used for purposes of monitoring activities and 
matically profiled on the fly and used for responsive index- general behavior of children by parents or on-line scholastic 
ing and notification of those users to whom the information (navigation) behavior by teachers. Phone companies may 
is valuable (and to whom it is privy). As the content of such also apply this technique to better monitor communications 
a dialogue may change with time, new users may be channels for suspicious activities. In eadi of the above 
prompted to join while others may be prompted or altema- 40 applications, an additional or alternative feamre is the ability 
tively (for confidentiality reasons) may be mandated to of the authoritative party to place restrictions on particular 
depart. Text summarization techniques may also be used to domains. These may be either explicitly mentioned 
allow relevant users who missed the virtual meeting lo have attributes or examples or those whidi are metrically "simi- 
access to a synopsized version thereof. Document profiles of lar" to the same. In one variation, a caller's identity (via 
such meetings may also be organized into a hierarchical 45 incoming phone number and/or Neural Net based voice ID) 
cluster tree using automatic cluster labeling or relevant is determinable, 
terms within cadi cluster (Steve's reference hierarchical 4. Virtual Qassroom 

cluster menu trees from previous patent). This technique is In one approach school activities (from either one or a 

useful for intuitive browsing of large archives of this large number of schools) may be accessible for participation 

information). Digital credentials may be prescribed to each 50 remotely. Classroom lectures, continuing education 

employee by superiors which indicate for him/her the spe- seminars, conferences, tutorials for job training (or on-going 

cific information contexts (by clusters) which are job training requirements) may apply. The most exemplary 

mandatory, which are recommended, which are neutral, and application however is the virtual classroom. Students may 

which are inappropriate for the employee to either access or use nearest neighbor indexing to either describe or present a 

(for the mandatory credential) require also mandatory (real- 55 particular topic or problems or a query. The system will 

time) attendance. A scheduling agent maybe used to orga- recommend the most appropriate on-line lecture either live, 

nize meeting times in advance by contacting and informing if the student wishes to interact (e.g., recommending the next 

the most relevant users as to the sUtcd objectives of the scheduled time) or the most appropriate pre-recorded lec- 

meeting. This is done by coordinating available time slots to ture. For sohitions to problems, a virtual tutor involving 

optimize the availabihty of the most number of user highest 60 (either a live or pre-recorded single (closed) session or 

relevance users to the dialogue (the user may also indicate muUi-student session may be presented similarly) or the 

among his/her available time the level of convenience as student may receive a recommendation of the name of the 

well). As above suggested, in virtual work groups a virtual most skilled or experienced faculty or student recommended 

meeting's objective may be to solve a particular problem, mtor. In the classroom application the smdent may either 

and develop a strategy, plan or proposal the stated objective 65 present questions on-hne to the lecturer (throughout the 

of which may be used to index a virtual group whose lecture or at pre-designated intervals) or the best ones may 

complement and skills provides an optimal sohition thereto. be selected by a moderator. 
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Additionally, if^vhen desired, sub-dialogues may occur future and wherein a date could be scheduled using a 

between attendees in the absence of the others. This is also scheduling agent for a group tour). Such a community could 

one application of joint user navigation as the presenter for example, be developed around such a travel destination 

(lecturer or student presenting a question) presents as part of a travel agent's Web site as a marketing pitch for 

questions, content or solutions or navigates through infor- 5 soliciting a trip, 

mational spaces in joint collaborative fashion for all attend- SUMMARY 
ees (or those designated by the presenter). 

In one variation students vAio are most in need of a A method has been presented for automatically selecting 

definable domain (attribute/cluster) indicated by their articles of interest lo a user. The method generates sets of 
request or lack of proficiency as evidenced in quiz or test lo ^^''^^ profiles for the users based on such attributes as the 

scores may be matched with offer students or tutors who are relative frequency of occurrence of words in the articles read 

proficient in those areas. In one approach students may be ^5 s&aich profiles to efiScienUy 

matched for purposes of collaborative study sessions in identify future articles of interest. The methods is charac- 

which priority is given to those which possess the greater terized by passive monitoring (users do not need to explic- 

degree of complementarity within their respective domains is ^^^^ ^^f^ articles), multiple search profiles per user 

of proficiency/deficiency. The present clustering model may (reflecting interest in multiple topics) and use of elements of 

further facilitate the predictors accuracy of the content the search profiles whidi are automatically determined from 

domains in which a student is expect to be proficient. For ^® data (notably, the TF/IDF measure based on word 

example, in the pure chistering model, it is possible to make frequencies and descriptions of purchasable items). A 

associations between which domains a student is LIKELY to 20 also been presented for automaticaUy generating 

be proficient in according to areas of previous proficiency menus to allow users to locate and retrieve articles on topics 

(within the same class ofdifferent ones based upon historical °f mterest. This method clusters articles based on their 

data from previous students). It can be appreciated that the similarity, as measured by the relative firequency of word 

present system may readily be applied also to corporate or occurrences. Ousters are labeled either with article titles or 

professional application including organizational training 25 words extracted from the article. The method can 

sessions, continuing education or conference seminars. ^ applied to large sets of articles distributed over tnany 

5. Virtual Communities Devebped Aroimd Product Genres, machines. 

Categories, or Items. It has been further shown how to extend the above 
The most "interested" users for a particular topic or target methods from articles to any class of target objects for which 
object (e.g., or limited to selected exemplary target objects) 30 profiles can be generated, including news articles, reference 
may be automatically matched for a virtual dialogue which work articles, electronic mail product or service 
is accessible directly from the target object of interest while descriptions, people (based on the articles they read, demo- 
browsing. This virtual dialogue includes standard bbs, IRC, graphic data, or the products they buy), and electronic 
Internet telephone and video telephony. Applications include bulletin boards (based on the articles posted to them). A 
store front products (and categories), musical albums, 35 particular consequence of being able to group people by 
movies, stocks (or mutual funds). In one approach the interests is that one can form virtual communities of 
criteria for creating a virtual group of watching people one people of common interest, who can then correspond with 
on one is to find among "similar" users the greatest degree another via electronic mail, 
of complimentarity (difference) in their respective experi- ^ claim: 

ences. Thus optimizing the conditions for the users to share 40 1- ^ method for providing a user with access to selected 

invaluable knowledge between one another business °f ^ plurality of target object bulletin boards that are 

venture, a regional or national economy. accessible via an electronic data transmission media, where 

6. (Ancillary Inclusion) Hybrid TV/PC ^rs are connected via user terminals and data com- 
In TV units which have integrated dual mode capabilities munication connections to a server system which provides 

for TV and PC fuiK:tionalities simultaneously (e.g., viewing 45 access to said electronic data transmission media, said 

TV programming while sending/receiving e-mail) the VCS method comprising the steps of: 

agent may be used not only to point the users to the most automatically generating target profiles for target object 

appropriate TV programming for their interest at any given bulletin boards that are accessible by said electronic 

time (selectively refer to and/or transcribe Home \idoo Club t^ata transmission media, each of said target profiles 

patent) but it may also bring the participating views of a so being generated from the contents of an associated one 

program to the attention of each other thus allowing viewers of said target object bulletin boards; 

to exchange comments or share perspectives about the automatically generating at least one user target profile 

programming before, during or after the program. Wthin interest summary for a user at a user terminal, each said 

these user circles VCs may further narrow the criteria of user target profile interest summary being generated 

interacting users by their specific viewing profiles. ss from ones of said target object bulletin boards accessed 

7. Physical Meetings by said user; and 

In one exemplary approach VCS organized conamunities enabling access to said plurality of target object bulletin 

may meet in physical fomms (e.g.. where all the members boards accessible by said electronic data transmission 

are required to Uve in a physically close proxiniity as a media by users via said target profile, comprising: 

prerequisite for matching) for example organizing meetings 60 automatically creating virtual communities of users of 

or according to general criteria (e.g., socializing and gath- said target object bulletin boards, comprising: 

ering in a restaurant/night club, concert or movie theater) or scanning bulletin board postings to existing target 

alternatively wherein a human or machine designated theme object bulletin boards, 

is the basis for the community for example a meeting around identifying groups of user identifications whose 

a political or a community related issue, an item of common 65 associated users have common interests, 

interest within a large organization, a vacation destination matching users with other like inclined users to 

(which all of the members are likely to wish to visit in the create a new target object bulletin board. 
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2. The method for providing a user with access to selected automatically generating target profiles for bulletin 
ones of a phirality of target object bulletin boards of claim boards that are accessible by said electronic data trans- 

1, wherein said step of automatically creating further com- mission media, each of said target profiles being gen- 
P"^^' erated firom the contents of an associated one of said 

dynamically creating electronic mailing lists for said users 5 bulletin boards. 

matched by said step of matching. 8. The method of operating a network-based agent of 

3. The method for providing a user with access to selected claim 7, wherein said step of automatically generating target 
ones of a phirality of target object bulletin boards of claim profiles comprises: 

2, wherein said step of automatically creating further com- . . • • . 

p^g. / 6 generatmg a target profile compnsmg the cluster profile 

„ , . for a cluster of documents posted on said bulletin 

automatically transmittmg a notification to said users boards 

matched by said step of matching to identify said new n Th^ ^„4u^j ^ ^ w ^ * r 

tf^rc^t «Wo^t Ki.iiof;» u^^^ t A /-J ^- ^® method of operatmg a network-based agent of 

target otnect bulletin board to said ones of said asso- i - ^ u - - j : ^- ■ , - % 

ciated users wherem said step of identifymg a group of users 

4. The method for providing a user with access to selected comprises: 

ones of a phirality of target object bulletin boards of claim automatically generating at least one user target profile 

„ 1, wherein said, step of automatically, creating fiuthcr com- interest summary for a user at a iiser ternunal, each said_ 

prises: user target profile interest summary being generated 

continuing to enroll additional users in said new target fi-om ones ofsaid bulletin boards accessed by said user, 

object bulletin board. 1^- Tbe method of operating a network-based agent of 

5. A method for providing a user with access to selected claim 6, wherein said step of automatically creating further 
ones of a plurality of target object bulletin boards that are comprises: 

accessible via an electronic data transmission media, where dynamically creating electronic mailing lists for said users 

said users are connected via user terminals and data com- ^ matched by said step of matching, 

munication connections to a server system which provides u. The method of operating a network-based agent of 

access to said electronic data transmission media, said claim 10, wherein said step of automatically creating further 

method comprising the steps of: comprises: 

'''^!IIl'f n'lI^/HT'!'^ "^"^ ^"If^t "^"^T ""^^^ automaticaUy transmitting a notification to said users 

buUe^ boards that arc accessible by said electronic 30 matched by said step of matching to identify said 

^t^r^'^'L^^^ ""^^ f • n^^"" Vrovosc6 new bulle 'n board to faid ones of said 

being generated from the contents of an associated one * t ri 

of said target object bulletin boards comprising: t'j^'^^u^'^i . . u \, 

generatingatargetprofilecomprisingtheclusterprofile . ^' operating a network-based agent of 

foradusterofdocumentsiLtedonsaidnewtarget 35 ^ wherem said step of automatically creatmg further 

object bulletin board; comprises: 

automatically generating at least one user Utget profile contmuing to enroll additional users in said proposed mw 

interest summary for a user at a user terminal, each bulletin board. 

said user target profile interest summary being gen- ^ method of operating a network-based agent of 

erated from ones ofsaid target object bulletin boards 40 ^' wherein said step of matching comprises: 

accessed by said user; and identifying an existing bulletin board whose mean profile 

enabling access to said plurality of target object bulletin of set of messages recently posted therein is within 

boards accessible by said electronic data transmis- ^ threshold distance of the cluster profile of said 

sion media by users via said target profile. proposed new bulletin board. 

6. A method of operating a network-based agent to seek 45 14. The method of operating a network-based agent of 
out users of a network with common interests, where said claim 13, further comprising the step of: 

users are connected via user terminals and data communi- automatically transmitting a notification to said users 

cation connections to a server system which provides access matched by said step of matching to identify said 

to an elecUronic data transmission media, comprising the existing bulletin board to said ones of said associated 

steps of: users. 

dynamically creating bulletin boards for said users, com- 15. The method of operating a network-based agent of 

prising: claim 14, wherein said step of automatically transmitting a 

scarming bulletin board postings to existing bulletin notification comprises: 

boards, transmitting to said users matched by said step of match- 
identifying a group of users who have common ss ing an indication at least one of the data comprising an 
mterests, indication of common interest including: a list of titles 
matchir^ users with odier like inclined users in said of messages rcccndy sent to the bulletin board, an 
identified group to create a proposed new bulletin introductory message provided by the bulletin board, a 

label that identifies the content of the cluster profile that 

7. The method of operating a network-based agent of so was used to identify the existing bulletin board, 
claim 6 wherein said step of scarming bulletin boards 

comprises: * * * * « 
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